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Abstract 

The  task  of  designing  and  implementing  a  compiler  can  be  a  difficult  and  error-prone  process.  In 
this  paper,  we  present  a  new  approach  based  on  the  use  of  higher-order  abstract  syntax  and  term 
rewriting  in  a  logical  framework.  All  program  transformations,  from  parsing  to  code  generation, 
are  cleanly  isolated  and  specified  as  term  rewrites.  This  has  several  advantages.  The  correctness 
of  the  compiler  depends  solely  on  a  small  set  of  rewrite  rules  that  are  written  in  the  language  of 
formal  mathematics.  In  addition,  the  logical  framework  guarantees  the  preservation  of  scoping,  and 
it  automates  many  frequently-occurring  tasks  including  substitution  and  rewriting  strategies.  As  we 
show,  compiler  development  in  a  logical  framework  can  be  easier  than  in  a  general-purpose  language 
like  ML,  in  part  because  of  automation,  and  also  because  the  framework  provides  extensive  support 
for  examination,  validation,  and  debugging  of  the  compiler  transformations.  The  paper  is  organized 
around  a  case  study,  using  the  MetaPRL  logical  framework  to  compile  an  ML-like  language  to  Intel 
x86  assembly.  We  also  present  a  scoped  formalization  of  x86  assembly  in  which  all  registers  are 
immutable. 


1  Introduction 

The  task  of  designing  and  implementing  a  compiler  can  be  difficult  even  for  a  small  language.  There  are 
many  phases  in  the  translation  from  source  to  machine  code,  and  an  error  in  any  one  of  these  phases 
can  alter  the  semantics  of  the  generated  program.  The  use  of  programming  languages  that  provide  type 
safety,  pattern  matching,  and  automatic  storage  management  can  reduce  the  compiler’s  code  size  and 
eliminate  some  common  kinds  of  errors.  However,  many  programming  languages  that  appear  well-suited 
for  compiler  implementation,  like  ML  m,  still  do  not  address  other  issues,  such  as  substitution  and 
preservation  of  scoping  in  the  compiled  program. 

In  this  paper,  we  present  an  alternative  approach,  based  on  the  use  of  higher-order  abstract  syntax 
mm  and  term  rewriting  in  a  general-purpose  logical  framework.  All  program  transformations,  from 
parsing  to  code  generation,  are  cleanly  isolated  and  specified  as  term  rewrites.  In  our  system,  term 
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rewrites  specify  an  equivalence  between  two  code  fragments  that  is  valid  in  any  context.  Rewrites  are 
bidirectional  and  neither  imply  nor  presuppose  any  particular  order  of  application.  Rewrite  application 
is  guided  by  programs  in  the  meta-language  of  the  logical  framework. 

There  are  many  advantages  to  using  formal  rewrites.  Program  scoping  and  substitution  are  managed 
implicitly  by  the  logical  framework;  it  is  not  possible  to  specify  a  program  transformation  that  modifies 
the  program  scope.  Perhaps  most  importantly,  the  correctness  of  the  compiler  is  dependent  only  on 
the  rewriting  rules.  Programs  that  guide  the  application  of  rewrites  do  not  have  to  be  trusted  because 
they  are  required  to  use  rewrites  for  all  program  transformations.  If  the  rules  can  be  validated  against 
a  program  semantics,  and  if  the  compiler  produces  a  program,  that  program  will  be  correct  relative 
to  those  semantics.  The  role  of  the  guidance  programs  is  to  ensure  that  rewrites  are  applied  in  the 
appropriate  order  so  that  the  output  of  the  compiler  contains  only  assembly. 

The  collection  of  rewrites  needed  to  implement  a  compiler  is  small  (hundreds  of  lines  of  formal 
mathematics)  compared  to  the  entire  code  base  of  a  typical  compiler  (often  more  than  tens  of  thousands 
of  lines  of  code  in  a  general-purpose  programming  language).  Validation  of  the  former  set  is  clearly 
easier.  Even  if  the  rewrite  rules  are  not  validated,  it  becomes  easier  to  assign  accountability  to  individual 
rules. 

The  use  of  a  logical  framework  has  another  major  advantage  that  we  explore  in  this  paper:  in  many 
cases  it  is  easier  to  implement  the  compiler,  for  several  reasons.  The  terminology  of  rewrites  corresponds 
closely  to  mathematical  descriptions  frequently  used  in  the  literature,  decreasing  time  from  concept 
to  implementation.  The  logical  framework  provides  a  great  deal  of  automation,  including  efficient 
substitution  and  automatic  a-renaming  of  variables  to  avoid  capture,  as  well  as  a  large  selection  of 
rewrite  strategies  to  guide  the  application  of  program  transformations.  The  compilation  task  is  phrased 
as  a  theorem-proving  problem,  and  the  logical  framework  provides  a  means  to  examine  and  debug  the 
effects  of  the  compilation  process  interactively.  The  facilities  for  automation  and  examination  establish 
an  environment  where  it  is  easy  to  experiment  with  new  program  transformations  and  extensions  to  the 
compiler. 

In  fairness,  formal  compilation  also  has  potential  disadvantages.  The  use  of  higher-order  abstract 
syntax,  in  which  variables  in  the  programming  language  are  represented  as  variables  in  the  logical 
language,  means  that  variables  cannot  be  manipulated  directly  in  the  formal  system;  operations  that 
modify  the  program  scope,  such  as  capturing  substitution,  are  difficult  if  not  impossible  to  express 
formally.  In  addition,  global  program  transformations,  in  which  several  parts  of  a  program  are  modified 
simultaneously,  can  sometimes  be  difficult  to  express  with  term  rewriting. 

The  most  significant  impact  of  using  a  formal  system  is  that  program  representations  must  permit 
a  substitution  semantics.  Put  another  way,  the  logical  framework  requires  the  development  of  func¬ 
tional  intermediate  representations,  where  heap  locations  may  be  mutable,  but  variables  are  not.  This 
potentially  has  a  major  effect  on  the  formalization  of  imperative  languages,  including  assembly  lan¬ 
guage,  where  registers  are  no  longer  mutable.  This  seeming  contradiction  can  be  resolved,  as  we  show 
in  the  second  half  of  this  paper,  but  it  does  require  a  departure  from  the  majority  of  the  literature  on 
compilation  methods. 

In  this  paper,  we  explore  these  problems  and  show  that  formal  compiler  development  is  feasible, 
perhaps  easy.  We  do  not  specifically  address  the  problem  of  compiler  verification  in  this  paper;  our  main 
objective  is  to  develop  the  models  and  methods  needed  during  the  compilation  process.  The  format 
of  this  paper  is  organized  around  a  case  study,  where  we  develop  a  compiler  that  generates  Intel  x86 
machine  code  for  an  ML-like  language  using  the  MetaPRL  logical  framework  mini  m.  The  compiler  is 
fully  implemented  and  online  as  part  of  the  Mojave  research  project  [7].  This  document  is  generated 
from  the  program  sources  (MetaPRL  provides  a  form  of  literate  programming),  and  the  complete  source 
code  is  available  online  at  http://metaprl.org/  as  well  as  in  the  technical  report. 
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1.1  Organization 

The  translation  from  source  code  to  assembly  is  usually  done  in  three  major  stages.  The  parsing  phase 
translates  a  source  file  (a  sequence  of  characters)  into  an  abstract  syntax  tree;  the  abstract  syntax 
is  translated  to  an  intermediate  representation;  and  the  intermediate  representation  is  translated  to 
machine  code.  The  reason  for  the  intermediate  representation  is  that  many  of  the  transformations  in 
the  compiler  can  be  stated  abstractly,  independent  of  the  source  and  machine  representations. 

The  language  that  we  are  using  as  an  example  (see  Section  [2])  is  a  small  language  similar  to  ML 
|19j.  To  keep  the  presentation  simple,  the  language  is  untyped.  However,  it  includes  higher-order  and 
nested  functions,  and  one  necessary  step  in  the  compilation  process  is  closure  conversion,  in  which  the 
program  is  modified  so  that  all  functions  are  closed.  The  high-level  outline  of  the  paper  is  as  follows. 


•  Section 

•  Section 

•  Section 

•  Section 
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parsing 

intermediate  representation  (IR) 
Intel  x86  assembly  code  generation 
Related  work 


Before  describing  each  of  these  stages,  we  first  introduce  the  terminology  and  syntax  of  the  formal 
system  in  which  we  define  the  program  rewrites. 


1.2  Terminology 

All  logical  syntax  is  expressed  in  the  language  of  terms.  The  general  syntax  of  all  terms  has  three  parts. 
Each  term  has  1)  an  operator-name  (like  “sum”),  which  is  a  unique  name  identifying  the  kind  of  term; 
2)  a  list  of  parameters  representing  constant  values;  and  3)  a  set  of  subterms  with  possible  variable 
bindings.  We  use  the  following  syntax  to  describe  terms: 

opname .  Jpi;  •  ■■;Pn]{v i-H  j  '  '  '  j  ^m4m} 
operator  name  parameters  subterms 


Displayed  form 

Term 

1 

number  [1] {} 

A  x.b 

lambda  []{  x.  b  } 

/(« ) 

apply []  {  f;  a  } 

x  +  y 

sum  []  {  x ;  y  } 

A  few  examples  are  shown  in  the  table.  Numbers  have  an  integer  parameter.  The  lambda  term 
contains  a  binding  occurrence:  the  variable  x  is  bound  in  the  subterm  b. 

Term  rewrites  are  specified  in  MetaPRL  using  second-order  variables,  which  explicitly  define  scoping 
and  substitution  m-  a  second-order  variable  pattern  has  the  form  v[vi\  ■  ■  ■ ;  r;n],  which  represents 
an  arbitrary  term  that  may  have  free  variables  v\,...,vn.  The  corresponding  substitution  has  the 
form  v[ti;  •  •  • ;  tn],  which  specifies  the  simultaneous,  capture-avoiding  substitution  of  terms  ti, . . .  ,tn  for 
vi, . . . ,  vn  in  the  term  matched  by  v.  For  example,  the  rule  for  /3-reduction  is  specified  with  the  following 
rewrite. 


[beta]  (\x.v\[x\)  vi  < — »  v\\v?\ 

The  left-hand-side  of  the  rewrite  is  a  pattern  called  the  redex.  The  v\[x]  stands  for  an  arbitrary 
term  with  free  variable  x,  and  V2  is  another  arbitrary  term.  The  right-hand-side  of  the  rewrite  is  called 
the  contractum.  The  second-order  variable  iq[w2]  substitutes  the  term  matched  by  V2  for  x  \n  v\.  A 
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term  rewrite  specifies  that  any  term  that  matches  the  redex  can  be  replaced  with  the  contractum,  and 
vice-versa. 

Rewrites  that  are  expressed  with  second-order  notation  are  strictly  more  expressive  than  those  that 
use  the  traditional  substitution  notation.  The  following  rewrite  is  valid  in  second-order  notation. 

[const]  (Ax.u[])  1  < — ►  (Ax.u[])  2 

In  the  context  Ax,  the  second-order  variable  u[]  matches  only  those  terms  that  do  not  have  x  as  a  free 
variable.  No  substitution  is  performed;  the  /3-reduction  of  both  sides  of  the  rewrite  yields  u[]  « — >  u[], 
which  is  valid  reflexively.  Normally,  when  a  second-order  variable  u[]  has  an  empty  free- variable  set  [], 
we  omit  the  brackets  and  use  the  simpler  notation  v. 

MetaPRL  is  a  tactic-based  prover  that  uses  OCarnl  [20]  as  its  meta-language.  When  a  rewrite  is 
defined  in  MetaPRL,  the  framework  creates  an  OCarnl  expression  that  can  be  used  to  apply  the  rewrite. 
Code  to  guide  the  application  of  rewrites  is  written  in  OCarnl,  using  a  rich  set  of  primitives  provided  by 
MetaPRL.  MetaPRL  automates  the  construction  of  most  guidance  code;  we  describe  rewrite  strategies 
only  when  necessary.  For  clarity,  we  will  describe  syntax  and  rewrites  using  the  displayed  forms  of 
terms. 

The  compilation  process  is  expressed  in  MetaPRL  as  a  judgment  of  the  form  T  P  compilable(e), 
which  states  the  the  program  e  is  compilable  in  any  logical  context  T.  The  meaning  of  the  compilable(e) 
judgment  is  defined  by  the  target  architecture.  A  program  e'  is  compilable  if  it  is  a  sequence  of  valid 
assembly  instructions.  The  compilation  task  is  a  process  of  rewriting  the  source  program  e  to  an 
equivalent  assembly  program  e! . 

2  Parsing 

In  order  to  use  the  formal  system  for  program  transformation,  source-level  programs  expressed  as 
sequences  of  characters  must  first  be  translated  into  a  term  representation  for  use  in  the  MetaPRL 
framework.  We  assume  that  the  source  language  can  be  specified  using  a  context-free  grammar,  and 
traditional  lexing  and  parsing  methods  can  be  used  to  perform  the  translation. 

MetaPRL  provides  these  capabilities  using  the  integrated  Phobos  [3]  generic  lexer  and  parser,  which 
enables  users  to  specify  parts  of  their  logical  theories  using  their  own  notation.  For  instance,  we  can 
use  actual  program  notation  (instead  of  the  uniform  term  syntax)  to  express  program  transformations 
in  rewrite  rules  and  we  can  specify  test  programs  in  source  notation. 

A  Phobos  language  specification  resembles  a  typical  parser  definition  in  YACC  [9],  except  that 
semantic  actions  for  productions  use  term  rewriting.  Phobos  employs  informal  rewriting,  which  means 
that  it  uses  a  rewriting  engine  that  can  create  new  variable  bindings  and  perform  capturing  substitution. 

In  Phobos,  the  lexer  for  a  language  is  specified  as  a  set  of  lexical  rewrite  rules  of  the  form  regex  < — > 
term,  where  regex  is  a  special  term  that  is  created  for  every  token  and  contains  the  matched  input  as 
a  string  parameter  as  well  as  a  subterm  containing  the  position  in  the  source  text,  which  can  be  used 
to  produce  more  informative  messages  if  an  error  is  detected.  The  following  example  demonstrates  a 
single  lexer  clause,  that  translates  a  nonnegative  decimal  number  to  a  term  with  operator  name  number 
and  a  single  integer  parameter. 

NUM  =  ”[0  —  9]  +  ”  {token[i]{pos}  < — >  number[i ]} 

The  parser  is  defined  as  a  set  of  grammar  productions.  For  each  grammar  production  in  the  program 
syntax  shown  in  Figure  [TJ  we  define  a  production  in  the  form 
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+1-1*1/ 

Binary  operators 

= 1 <> 1 < 1 < 1 > 1 > 

T|± 

Booleans 

i 

Integers 

V 

Variables 

e  op  e 

Binary  expressions 

Xv.e 

Anonymous  functions 

if  e  then  e  else  e 

Conditionals 

e[e] 

Subscripting 

e[e]  <-  e 

Assignment 

e;  e 

Sequencing 

e(ei , • •  • , en ) 

Application 

let  v  =  e  in  e 

Let  definitions 

let  recfi(vi,  ...,vn)  =  e 

Recursive  functions 

and  fn(v i,  ...,vn)=e 

Figure  1:  Program  syntax 


S  ::=  Si  ...  Sn  < — >  term 

where  the  symbols  St  may  be  annotated  with  a  term  pattern.  For  instance,  the  production  for  the 
let-expression  is  defined  with  the  following  production  and  semantic  action. 

exp  ::=  LET  ID  (v)  EQ  exp  (e)  IN  exp  (rest) 

« — >  let  v  =  e  in  rest 

Phobos  constructs  an  LALR(l)  parser  from  these  specifications  that  maintains  a  stack  of  terms  and 
applies  the  associated  rewrite  rule  each  time  a  production  is  reduced  by  replacing  the  corresponding 
terms  on  the  stack  with  the  result.  For  the  parser  to  accept,  the  stack  must  contain  a  single  term 
corresponding  to  the  start  symbol  of  the  grammar. 

It  may  not  be  feasible  during  parsing  to  create  the  initial  binding  structure  of  the  programs.  For 
instance,  in  our  implementation  function  parameters  are  collected  as  a  list  and  are  not  initially  bound  in 
the  function  body.  Furthermore,  for  mutually  recursive  functions,  the  function  variables  are  not  initially 
bound  in  the  functions’  bodies.  For  this  reason,  the  parsing  phases  is  usually  followed  by  an  additional 
rewrite  phase  that  performs  these  operations  using  the  informal  rewriting  engine.  The  source  text  is 
replaced  with  the  resulting  term  on  completion. 

3  Intermediate  representation 

The  intermediate  representation  of  the  program  must  serve  two  conflicting  purposes.  It  should  be  a 
fairly  low-level  language  so  that  translation  to  machine  code  is  as  straightforward  as  possible.  However, 
it  should  be  abstract  enough  that  program  transformations  and  optimizations  need  not  be  overly  con¬ 
cerned  with  implementation  detail.  The  intermediate  representation  we  use  is  similar  to  the  functional 
intermediate  representations  used  by  several  groups  mmm,  in  which  the  language  retains  a  similarity 
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binop 

::=  +|-|*|/ 

Binary  arithmetic 

relop 

::=  =  |  <>  |  <  |  <  |  >  | 

>  Binary  relations 

l 

::=  string 

Function  label 

a 

::=  T  _L 

Boolean  values 

i 

Integers 

V 

Variables 

ai  binop  a 2 

Binary  arithmetic 

ai  relop  a 2 

Binary  relations 

R.l 

Function  labels 

e 

: :  =  let  v  =  a  in  e 

Variable  definition 

if  a  then  e\  else  e 2 

Conditional 

let  v  =  (ai, . . . ,  an) 

in  e  Tuple  allocation 

let  v  =  ai[a2]  in  e 

Subscripting 

01(02]  <—  03;  e 

Assignment 

let  v  —  0(01  , . . . ,  U72 

in  e  Function  application 

letc  v  =  01(02)  in  e 

Closure  creation 

return  a 

Return  a  value 

a(oi ,  .  .  .  ,  On) 

Tail-call 

let  rec  R  =  d  in  e 

Recursive  functions 

e\ 

::=  Xv.e\\Xv.e 

Functions 

d 

::=  fun  l  =  e\  and  d 

Function  definitions 

e 


Figure  2:  Intermediate  Representation 


to  an  ML-like  language  where  all  intermediate  values  apart  from  arithmetic  expressions  are  explicitly 
named. 

In  this  form,  the  IR  is  partitioned  into  two  main  parts:  “atoms”  define  values  like  numbers,  arith¬ 
metic,  and  variables;  and  “expressions”  define  all  other  computation.  The  language  includes  arithmetic, 
conditionals,  tuples,  functions,  and  function  definitions,  as  shown  in  Figure  [2] 

Function  definitions  deserve  special  mention.  Functions  are  defined  using  the  let  rec  R  =  d  in  e 
term,  where  d  is  a  list  of  mutually  recursive  functions,  and  R  represents  a  recursively  defined  record 
containing  these  functions.  Each  of  the  functions  is  labeled,  and  the  term  R.l  represents  the  function 
with  label  l  in  record  R. 

While  this  representation  has  an  easy  formal  interpretation  as  a  fixpoint  of  the  single  variable  R,  it 
is  awkward  to  use,  principally  because  it  violates  the  rule  of  higher-order  abstract  syntax:  namely,  that 
(function)  variables  be  represented  as  variables  in  the  nreta-language.  In  some  sense,  this  representation 
is  an  artifact  of  the  MetaPRL  term  language:  it  is  not  possible,  given  the  term  language  described  in 
Section  |1.2|  to  define  more  than  one  recursive  variable  at  a  time.  We  are  currently  investigating 
extending  the  meta-language  to  address  this  problem. 
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3.1  AST  to  IR  conversion 


The  main  difference  between  the  abstract  syntax  representation  and  the  IR  is  that  intermediate  ex¬ 
pressions  in  the  AST  do  not  have  to  be  named.  In  addition,  the  conditional  in  the  AST  can  be  used 
anywhere  an  expression  can  be  used  (for  instance,  as  the  argument  to  a  function),  while  in  the  IR,  the 
branches  of  the  conditional  must  be  terminated  by  a  return  a  expression  or  tail-call. 

The  translation  from  AST  to  IR  is  straightforward,  but  we  use  it  to  illustrate  a  style  of  translation  we 
use  frequently.  The  term  IR{ei;  v.e2[v]}  (displayed  as  [ei] jRv.e2[v])  is  the  translation  of  an  expression 
e\  to  an  IR  atom,  which  is  substituted  for  v  in  expression  e2  [u] .  The  translation  problem  is  expressed 
through  the  following  rule,  which  states  that  a  program  e  is  compilable  if  the  program  can  be  translated 
to  an  atom,  returning  the  value  as  the  result  of  the  program. 

T  b  compilable([e]ffiu. return  v) 

T  b  compilable(e) 

For  many  AST  expressions,  the  translation  to  IR  is  straightforward.  The  following  rules  give  a  few 
representative  examples. 


[int]  [i]IRv.e[v]  * — *  e[i] 

[var]  lvi}IRV2-e[v2]  < — >  e[vi] 

[add]  \e\  +  e2\IRv.e[v] 

< — >  [ei]IRv\.[e2}IRV2.e[v1  + v2\ 

[set]  [ei[e2]  <—  e3]fflu.e4[u] 

< — ♦  [eil/flWi- 

[e2]/fl«2- 

vi[v2]  <-  u3; 

e4[T] 

For  conditionals,  code  duplication  is  avoided  by  wrapping  the  code  after  the  conditional  in  a  function, 
and  calling  the  function  at  the  tail  of  each  branch  of  the  conditional. 

[if]  [if  e4  then  e2  else  e.^IRv.e^[v\ 

< - »  let  rec  R  =  fun  g  =  \v.e4\y\  and  e  in 

lei\lRvi- 

if  v\  then  [ e2\IRV2.{R.g{v2))  else  le3jIRv3.(R.g(v3)) 

For  functions,  the  post-processing  phase  converts  recursive  function  definitions  to  the  record  form, 
and  we  have  the  following  translation,  using  the  term  ld}IR  to  translate  function  definitions.  In  general, 
anonymous  functions  must  be  named  except  when  they  are  outermost  in  a  function  definition.  The  post¬ 
processing  phase  produces  two  kinds  of  A- abstractions,  the  \pv.e[v\  is  used  to  label  function  parameters 
in  recursive  definitions,  and  the  \v.e[v\  term  is  used  for  anonymous  functions. 
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[letrec] 

[let  rec  R  =  d  in  IRv .e2[v] 

< - * 

let  rec  R  =  [d]/i?  in  {eij IRv .e2[v] 

[fun] 

[fun  l  =  e  and  d]  JR 

4 - * 

fun  l  =  [e]/i?u. return  v  and  [d]  IR 

[par  am] 

\\pvi.ei[vi\\IRV2.e2[v2\ 

4 - * 

Aui.([ei[ui  ]jmV2-e2[v2\) 

[abs] 

[Aui  ■e\[v\]\IRV2-e2  [f  2] 

4 - * 

let  rec  R  = 

fun  g  =  Avi.[ei[fi]]/RV3.return  V3  and  e 
in  e2[R.g\ 

3.2  CPS  conversion 

CPS  conversion  is  an  optional  phase  of  the  compiler  that  converts  the  program  to  continuation-passing 
style.  That  is,  instead  of  returning  a  value,  functions  pass  their  results  to  a  continuation  function 
that  is  passed  as  an  argument.  In  this  phase,  all  functions  become  tail-calls,  and  all  occurrences  of 
let  v  =  01(02)  in  e  and  return  a  are  eliminated.  The  main  objective  in  CPS  conversion  is  to  pass  the 
result  of  the  computation  to  a  continuation  function.  We  state  this  formally  as  the  following  inference 
rule,  which  states  that  a  program  e  is  compilable  if  for  all  functions  c,  the  program  [e]c  is  compilable. 

T,  c:  exp  h  compilable([e]c) 

T  b  compilable(e) 

The  term  [e]c  represents  the  application  of  the  c  function  to  the  program  e,  and  we  can  use  it  to 
transform  the  program  e  by  migrating  the  call  to  the  continuation  downward  in  the  expression  tree. 
Abstractly,  the  process  proceeds  as  follows. 

•  First,  replace  each  function  definition  /  =  \x.e[x\  with  a  continuation  form  /  =  Ac.Ax.[e[x]]]c 
and  simultaneously  replace  all  occurrences  of  /  with  the  partial  application  /[id],  where  id  is  the 
identity  function. 

•  Next,  replace  tail-calls  [/[id](ai, . . . ,  an)]c  with  /(c,  a\, . . . ,  an),  and  return  statements 
[return  a]c  with  c(a). 

•  Finally,  replace  inline-calls  [ let  v  =  /[id](oi, . . . ,  an )  in  e]  with  the  continuation-passing  version 
let  rec  R  =  fun  g  =  Au.[e]c  and  e  in  f(g ,  a\, . . . ,  an). 

For  many  expressions,  CPS  conversion  is  a  straightforward  mapping  of  the  CPS  translation,  as 
shown  by  the  following  five  rules. 


[atom]  [let  v  =  a  in  e[x]Jc  < — >  let  v  =  a  in  [e[t>]]c 

[tuple]  [ let  v  =  (ai, . . .  ,an)  in  e\v\\c 
< — >  letu=  (ai, . . . ,  an)  in  [e[u]]c 

[letsub]  [letw=  01(02]  in  e[u]Jc 
< — >  letu  =  ai[a2]  in  [e[u]Jc 

[setsub]  [ai[a2]  <-  a3]  e[v]jc  < — >  ai[a2]  e-  a3;  [e[u]]c 

[if]  [if  a  then  ei  else  e2]c 
< — >  if  a  then  [ei]c  else  [e2]]c 
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The  modification  of  functions  is  the  key  part  of  the  conversion.  When  a  let  rec  R  =  d[R ]  in  e[R] 
term  is  converted,  the  goal  is  to  add  an  extra  continuation  parameter  to  each  of  the  functions  in  the 
recursive  definition.  Conversion  of  the  function  definition  is  shown  in  the  fundef  rule,  where  the  function 
gets  an  extra  continuation  argument  that  is  then  applied  to  the  function  body. 

In  order  to  preserve  the  program  semantics,  we  must  then  replace  all  occurrences  of  the  function 
with  the  term  /[id],  which  represents  the  partial  application  of  the  function  to  the  identity.  This  step 
is  performed  in  two  parts:  first  the  letrec  rule  replaces  all  occurrences  of  the  record  variable  R  with  the 
term  f?[id],  and  then  the  letfun  rule  replaces  each  function  variable  /  with  the  term  /[id]. 


[letrec] 

[let  rec  R  =  d[R]  in  e[i?]Jc 

4 - * 

let  rec 

R  =  [d[f?[id]]Jc  in  [e[R[id]]J 

[fundef] 

[fun  l  = 

=  Xv.e[v]  and  d]c 

4 - * 

fun  l  = 

Ac.Au.[e[u]Jc  and  [d]c 

[enddef] 

14  — 

-*  e 

[letfun] 

[  let  v  = 

=  R[id].l  in  e[v]jc 

4 - * 

let  v  = 

R.l  in  [e[u[id]]]c 

Non-tail-call  function  applications  must  also  be  converted  to  continuation  passing  form,  as  shown  in 
the  apply  rule,  where  the  expression  after  the  function  call  is  wrapped  in  a  continuation  function  and 
passed  as  a  continuation  argument. 

[apply]  [let  v2  =  ui[id](a)  in  e[v2]}c 
< — >  let  rec  R  =  fun  g  =  Au.[e[u]]c  and  e  in 

let  g  =  R.g  in  f{g-,a) 

In  the  final  phase  of  CPS  conversion,  we  can  replace  return  statements  with  a  call  to  the  continua¬ 
tion.  For  tail-calls,  we  replace  the  partial  application  of  the  function  /[id]  with  an  application  to  the 
continuation. 


[return]  [return  a]c  < — *  c(a ) 

[tailcall]  [/[id](ai, . . . ,  an)]c  < — >  /(c,  ai, . . . ,  an) 

3.3  Closure  conversion 

The  program  intermediate  representation  includes  higher-order  and  nested  functions.  The  function 
nesting  must  be  eliminated  before  code  generation,  and  the  lexical  scoping  of  function  definitions  must 
be  preserved  when  functions  are  passed  as  values.  This  phase  of  program  translation  is  normally 
accomplished  through  closure  conversion ,  where  the  free  variables  for  nested  functions  are  captured 
in  an  environment  as  passed  to  the  function  as  an  extra  argument.  The  function  body  is  modified  so 
that  references  to  variables  that  were  defined  outside  the  function  are  now  references  to  the  environment 
parameter.  In  addition,  when  a  function  is  passed  as  a  value,  the  function  is  paired  with  the  environment 
as  a  closure. 

The  difficult  part  of  closure  conversion  is  the  construction  of  the  environment,  and  the  modification 
of  variables  in  the  function  bodies.  We  can  formalize  closure  conversion  as  a  sequence  of  steps,  each  of 
which  preserves  the  program’s  semantics.  In  the  first  step,  we  must  modify  each  function  definition  by 
adding  a  new  environment  parameter.  To  represent  this,  we  replace  each  let  rec  R  =  d  in  e  term  in 
the  program  with  a  new  term  let  rec  R  with  [/  =  ()]  =  d  in  e,  where  /  is  an  additional  parameter, 
initialized  to  the  empty  tuple  (),  to  be  added  to  each  function  definition.  Simultaneously,  we  replace 
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every  occurrence  of  the  record  variable  R  with  R(f),  which  represents  the  partial  application  of  the 
record  R  to  the  tuple  /. 


[frame]  let  rec  R  =  d[R]  in  e[R] 

< — ■>  let  rec  R  with  [/  =  ()]  =  d[R(f)]  in  e[R(f)\ 

The  second  part  of  closure  conversion  does  the  closure  operation  using  two  operations.  For  the  first 
part,  suppose  we  have  some  expression  e  with  a  free  variable  v.  We  can  abstract  this  variable  using 
a  call-by-name  function  application  as  the  expression  letv  =  v  in  e,  which  reduces  to  e  by  simple 
/3-reduction. 


[abs]  e[v\  < — >  letv  =  v  in  e[v\ 

By  selectively  applying  rule,  we  can  quantify  variables  that  occur  free  in  the  function  definitions 
d  in  a  term  let  rec  R  with  [/  =  tuple ]  =  d  in  e.  The  main  closure  operation  is  the  addition  of  the 
abstracted  variable  to  the  frame,  using  the  following  rewrite. 

[close]  let  v  =  a  in 

let  rec  R  with  [/  =  (ai, . . . ,  an)]  = 
d[R;  v;  f } 
in  e[R;  v;  f] 

< — >  let  rec  R  with  [/  =  (ai, . . . ,  an,  a)]  = 

let  v  =  f[n  +  1]  in  d[R;  v;  f] 
in  let  v  =  a  in  e[R\  v\  f } 

Once  all  free  variables  have  been  added  to  the  frame,  the  let  rec  R  with  [/  =  tuple]  =  d  in  e 
rewritten  to  use  explicit  tuple  allocation. 

[alloc]  let  rec  R  with  [/  =  tuple]  = 

d[R:f] 

in  e[R\  f] 

< — >  let  rec  R  =  fram e(f,d[R;  /])  in 

let  /  =  (tuple)  in  e[R]  f] 

The  final  step  of  closure  conversion  is  to  propagate  the  subscript  operations  into  the  function  bodies. 

[arg]  frame  (/,  fun  l  =  A v.e[f-,v]  and  d[f]) 

< — *  fun  l  =  \f.\v.e[f;v]  and  frame (/,  d [frame]) 

[sub]  letui  =  ai[d2]  in  fun  l  =  At>2.e[ui;  U2]  and  d[v  1] 

< — >  fun  l  =  Xv2-  let  v\  =  <n[a2]  in  e[vp,  V2]  and 

let  v\  =  ai[a2]  in  d[ui] 

3.4  IR  optimizations 

Many  optimizations  on  the  intermediate  representation  are  quite  easy  to  express.  For  illustration,  we 
include  two  very  simple  optimizations:  dead-code  elimination  and  constant  folding. 

3.4.1  Dead  code  elimination 

Formally,  an  expression  e  in  a  program  p  is  dead  if  the  removal  of  expression  e  does  not  change  the 
behavior  of  the  program  p.  Complete  elimination  of  dead-code  is  undecidable:  for  example,  an  expression 
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e  is  dead  if  no  program  execution  ever  reaches  expression  e.  The  most  frequent  approximation  is  based 
on  scoping:  a  let-expression  let  v  =  a  in  e  is  dead  if  v  is  not  free  in  e.  This  kind  of  dead-code  elimination 
can  be  specified  with  the  following  set  of  rewrites. 


[datorn] 

let  v  = 

a  in  e  < — »  e 

[dtuple] 

let  v  = 

(ai,...,an)  in  e  e 

[dsub] 

let  v  = 

01(02]  in  e  < — »  e 

[del] 

letc  v  - 

=  01(02)  in  e  « — »  e 

The  syntax  of  these  rewrites  depends  on  the  second-order  specification  of  substitution.  Note  that 
the  pattern  e  is  not  expressed  as  the  second-order  pattern  e[v\.  That  is,  v  is  not  allowed  to  occur  free 
in  e. 

Furthermore,  note  that  dead-code  elimination  of  this  form  is  aggressive.  For  example,  suppose  we 
have  an  expression  let  v  =  a  /  0  in  e.  This  expression  is  considered  as  dead-code  even  though  division 
by  0  is  not  a  valid  operation.  If  the  target  architecture  raises  an  exception  on  division  by  zero,  this  kind 
of  aggressive  dead-code  elimination  is  unsound.  This  problem  can  be  addressed  formally  by  partitioning 
the  class  of  atoms  into  two  parts:  those  that  may  raise  an  exception,  and  those  that  do  not,  and  applying 
dead-code  elimination  only  on  the  first  class.  The  rules  for  dead-code  elimination  are  the  same  as  above, 
where  the  calls  of  atom  a  refers  only  to  those  atoms  that  do  not  raise  exceptions. 


3.4.2  Constant-folding 

Another  simple  class  of  optimizations  is  constant  folding.  If  we  have  an  expression  that  includes  only 
constant  values,  the  expression  may  be  computed  at  compile  time.  The  following  rewrite  captures  the 
arithmetic  part  of  this  optimization,  where  [op]  is  the  interpretation  of  the  arithmetic  operator  in  the 
nreta-language.  Relations  and  conditionals  can  be  folded  in  a  similar  fashion. 


[binop]  i  binop  j  < — >  [op](i,  j) 
[relop]  i  relop  j  < — >  [op](i,  j) 

[ift]  if  T  then  e\  else  e 2  « — >  e\ 

[iff]  if  _L  then  ei  else  e2  « — »  e2 


In  order  for  these  transformations  to  be  faithful,  the  arithmetic  must  be  performed  over  the  numeric 
set  provided  by  the  target  architecture  (our  implementation,  described  in  Section  4.2  uses  31-bit  signed 
integers). 

For  simple  constants  a,  it  is  usually  more  efficient  to  inline  the  let  v  =  a  in  e\v\  expression  as  well. 


[cint] 

let  v  = 

i  in  e[v\  < — 

[cfalse] 

let  v  = 

_L  in  e\v\  <— 

-e[T] 

[ctrue] 

let  v  = 

T  in  e\v\  <— 

-e[T] 

[evar] 

letu2  = 

=  v\  in  e[v2] 

« — >  e[v\] 

4  Scoped  x86  assembly  language 

Once  closure  conversion  has  been  performed,  all  function  definitions  are  top-level  and  closed,  and  it 
becomes  possible  to  generate  assembly  code.  When  formalizing  the  assembly  code,  we  continue  to  use 
higher-order  abstract  syntax:  registers  and  variables  in  the  assembly  code  correspond  to  variables  in 
the  nreta-language.  There  are  two  important  properties  we  must  maintain.  First,  scoping  must  be 
preserved:  there  must  be  a  binding  occurrence  for  each  variable  that  is  used.  Second,  in  order  to 
facilitate  reasoning  about  the  code,  variables/registers  must  be  immutable. 
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These  two  requirements  seem  at  odds  with  the  traditional  view  of  assembly,  where  assembly  instruc¬ 
tions  operate  by  side-effect  on  a  finite  register  set.  In  addition,  the  Intel  x86  instruction  set  architecture 
primarily  uses  two-operand  instructions,  where  the  value  in  one  operand  is  both  used  and  modified  in 
the  same  instruction.  For  example,  the  instruction  ADD  r\,r2  performs  the  operation  rq  <—  rq  +  7-2, 
where  rq  and  r 2  are  registers. 

To  address  these  issues,  we  define  an  abstract  version  of  the  assembly  language  that  uses  a  three 
operand  version  on  the  instruction  set.  The  instruction  ADD  cq,  V2,  Acq.e  performs  the  abstract  op¬ 
eration  let  V3  =  cq  +  V2  in  e.  The  variable  V3  is  a  binding  occurrence,  and  it  is  bound  in  body  of  the 
instruction  e.  In  our  account  of  the  instruction  set,  every  instruction  that  modifies  a  register  has  a 
binding  occurrence  of  the  variable  being  modified.  Instructions  that  do  not  modify  memory  use  the  tra¬ 
ditional  non-binding  form  of  the  instruction.  For  example,  the  instruction  ADD  cq,  (%V2)',e  performs 
the  operation  (%V2)  <—  (%V2)  +  rq,  where  (%V2)  means  the  value  in  memory  at  location  V2- 

The  complete  abstract  instruction  set  that  we  use  is  shown  in  Figure  [3]  (the  Intel  x86  architecture 
includes  a  large  number  of  complex  instructions  that  we  do  not  use) .  Instructions  may  use  several  forms 
of  operands  and  addressing  modes. 

•  The  immediate  operand  $c  is  a  constant  number  i. 

•  The  label  operand  $R.l  refers  to  the  address  of  the  function  in  record  R  labeled  l. 

•  The  register  operand  %v  refers  to  register/ variable  v. 

•  The  indirect  operand  (%v)  refers  to  the  value  in  memory  at  location  v. 

•  The  indirect  offset  operand  i(%v)  refers  to  the  value  in  memory  at  location  v  +  i. 

•  The  array  indexing  operand  ii(%cq,  %V2, 12)  refers  to  the  value  in  memory  at  location  v\+V2*i2+ii, 
where  C2  G  {1,2, 4, 8}. 

The  instructions  can  be  placed  in  several  main  categories. 

•  MOV  instructions  copy  a  value  from  one  location  to  another.  The  instruction  MOV  cq ,  Ac^-efc^] 
copies  the  value  in  operand  cq  to  variable  c^- 

•  One-operand  instructions  have  the  forms  instl  cq ;  e  (where  cq  must  be  an  indirect  operand) , 
and  instl  cq,  \v2-e.  For  example,  the  instruction  INC  (%rq);e  performs  the  operation  (%rq)  <— 
(%rq)  +  1;  e;  and  the  instruction  INC  %rq,  \r2.e  performs  the  operation  let  r2  =  rq  +  1  in  e. 

•  Two-operand  instructions  have  the  forms  inst2  cq,  02;e,  where  02  must  be  an  indirect  operand; 
and  inst2  cq,  V2,  Acq.e.  For  example,  the  instruction  ADD  %rq,  (%r2);e  performs  the  operation 
(%7"2)  <—  (%7,2)  +  7’i;  e;  and  the  instruction  ADD  cq,  V2,  Acq.e  is  equivalent  to  letcq  =  cq  +  c^  in  e. 

•  There  are  two  three-operand  instructions:  one  for  multiplication  and  one  for  division,  having  the 
form  instS  cq,  V2,  V3,  Xv4,V5.e.  For  example,  the  instruction  DIV  %rq,  %r2,  %r%,  Xr^^r^.e  per¬ 
forms  the  following  operation,  where  (r2,rf)  is  the  64-bit  value  r2*232  +  r3.  The  Intel  specification 
requires  that  r 4  be  the  register  eax,  and  rs  the  register  edx. 

letr4  =  (r2,r3)/ri  in 
let  rs  =  (r2,  r^)  mod  ?q  in 
e 
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l 

:= 

string 

Function  labels 

r 

:= 

eax\ebx\ecx\edx 

Registers 

| 

esi\edi\esp\ebp 

v  : 

:= 

r\vi,v2,  ■  ■  ■ 

Variables 

Om  : 

;  = 

(%v) 

Memory  operands 

| 

i(%v) 

j 

h(%Vl,%V2,l2) 

Oy 

:  = 

%v 

Register  operand 

0 

:= 

Om\Or 

General  operands 

| 

$* 

Constant  number 

1 

$v.l 

Label 

cc 

:  = 

= 1 <> 1 < 1 > 1 < 1 > 

Condition  codes 

instl  : 

:  = 

INC\DEC\--- 

1-operand  opcodes 

inst2  : 

:  = 

ADD\SUB\AND\  •  •  • 

2-operand  opcodes 

instS 

:  = 

MUL\DIV 

3-operand  opcodes 

cmp  : 

:  = 

CMP\  TEST 

comparisons 

jmp  : 

:= 

JMP 

unconditional  branch 

jcc 

:= 

JEQ\JLT\JGT\ ■■■ 

conditional  branch 

e 

:= 

MOV  o,  Xv.e 

Copy 

1 

instl  om;  e 

1-operand  mem  inst 

1 

instl  or,  Xv.e 

1-operand  reg  inst 

1 

inst2  or,  om;  e 

2-operand  mem  inst 

1 

inst2  o,  or,  Xv.e 

2-operand  reg  inst 

1 

instS  o,  or,  or,  Xv\,v2.e 

3-operand  reg  inst 

1 

cmp  oi,  o-2 

Comparison 

1 

jmp  o(or; . . .;  or) 

Unconditional  branch 

1 

j  cc  then  e\  else  e2 

Conditional  branch 

P 

1 

let  rec  R  =  d  in  p\e 

Programs 

d 

1 

l  =  e\  and  d\e 

Function  definition 

e\  : 

:= 

Xv.e\\e 

Functions 

Figure  3:  Scoped  Intel  x86  instruction  set 
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•  The  comparison  operand  has  the  form  CMP  o\ ,  02;  e,  where  the  processor’s  condition  code  register 
is  modified  by  the  instruction.  We  do  not  model  the  condition  code  register  explicitly  in  our  current 
account.  However,  doing  so  would  allow  more  greater  flexibility  during  code-motion  optimizations 
on  the  assembly. 

•  The  unconditional  branch  operation  JMP  o(oi,...,on)  branches  to  the  function  specified  by 
operand  o,  with  arguments  (01, . . . ,  on).  The  arguments  are  provided  so  that  the  calling  convention 
may  be  enforced. 

•  The  conditional  branch  operation  J  cc  then  c\  else  e2  is  a  conditional.  If  the  condition-code 
matches  the  value  in  the  processor’s  condition-code  register,  then  the  instruction  branches  to 
expression  ei;  otherwise  it  branches  to  expression  e2- 

•  Functions  are  defined  using  the  let  rec  R  =  d  in  e  which  corresponds  exactly  to  the  same 
expression  in  the  intermediate  representation.  The  subterm  d  is  a  list  of  function  definitions,  and 
e  is  an  assembly  program.  Functions  are  defined  with  the  Xv.e,  where  v  is  a  function  parameter 
in  instruction  sequence  e. 


4.1  Translation  to  concrete  assembly 


Since  the  instruction  set  as  defined  is  abstract,  and  contains  binding  structure,  it  must  be  translated 
before  actual  generation  of  machine  code.  The  first  step  in  doing  this  is  register  allocation:  every  variable 
in  the  assembly  program  must  be  assigned  to  an  actual  machine  register.  This  step  corresponds  to  an 
cc-conversion  where  variables  are  renamed  to  be  the  names  of  actual  registers;  the  formal  system  merely 


validates  the  renaming.  We  describe  this  phase  in  the  section  on  register  allocation  4.3 


The  final  step  is  to  generate  the  actual  program  from  the  abstract  program.  This  requires  only  local 
modifications,  and  is  implemented  during  printing  of  the  program  (that  is,  it  is  implemented  when  the 
program  is  exported  to  an  external  assembler).  The  main  translation  is  as  follows. 


•  Memory  instructions  instl  om;e,  inst2  or,  om;e,  and  cmp  o\,  02;  e  can  be  printed  directly. 

•  Register  instructions  with  binding  occurrences  require  a  possible  additional  mov  instruction.  For 
the  1-operand  instruction  instl  cy,  A r.e,  if  cy  =  %r,  then  the  instruction  is  implemented  as 
instl  r.  Otherwise,  it  is  implemented  as  the  two-instruction  sequence: 


MOV  or,%r 
instl  %r 

Similarly,  the  two-operand  instruction  inst2  o,  cy,  A  r.e  may  require  an  addition  mov  from  cy  to 
r,  and  the  three-operand  instruction  inst3  o,  cyl5  or2,  Ari^.e  may  require  two  additional  mov 
instructions. 

•  The  JMP  0(01, . . . ,  on)  prints  as  JMP  o.  This  assumes  that  the  calling  convention  has  been  satisfied 
during  register  allocation,  and  all  the  arguments  are  in  the  appropriate  places. 

•  The  J  cc  then  e\  else  e2  instruction  prints  as  the  following  sequence,  where  cc'  is  the  inverse  of 
cc,  and  l  is  a  new  label. 


Jcc'  1 
ei 

1:  e2 
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[false] 

[Tjau.e[u]  <- 

->  e[$l] 

[true] 

[TJa^-eH  <- 

->  e[$3] 

[int] 

Hav.e[u]  < — 

>e[$i*2  +  l] 

[var] 

lvilav2.e[v2\  • 

i — *  e[%v] 

[label] 

lR.ljav.e[v]  <- 

— *  e[$f?.Z] 

[add] 

[ai  +  a-2  }av.e 

M 

[«2lav2- 


ADD  v2,  v\,  Xtmp. 

DEC  Votmp,  Xsum. 
e[%sum ] 

[div]  [cti  /  a2jav.e[v] 

* — >  [«ilaui- 

[«2law2- 

SAR  $1,  v\,  Xv[. 

SAR  $1,  v2,  Xv2. 

MOV  $0,  Xv3. 

DIV  %v\ ,  %v'2,  %v'3,  Xq',r'. 
SHL  $1,  %q',  X q". 

OR  $1,  %q",  X q. 
e[%q] 


Figure  4:  Translation  of  atoms  to  x86  assembly 


•  A  function  definition  l  =  e  and  d  in  a  record  let  rec  R  =  d  in  e  is  implemented  as  a  labeled 
assembly  expression  R.l:e.  We  assume  that  the  calling  convention  has  been  established,  and  the 
function  abstraction  Xv.e  ignores  the  parameter  v,  assembling  only  the  program  e. 

The  compiler  back-end  then  has  three  stages:  1)  code  generation,  2)  register  allocation,  and  3) 
peephole  optimization,  described  in  the  following  sections. 

4.2  Assembly  code  generation 

The  production  of  assembly  code  is  primarily  a  straightforward  translation  of  operations  in  the  interme¬ 
diate  code  to  operations  in  the  assembly.  There  are  two  main  kinds  of  translations:  translations  from 
atoms  to  operands,  and  translation  of  expressions  into  instruction  sequences.  We  express  these  trans¬ 
lations  with  the  term  [e]a,  which  is  the  translation  of  the  IR  expression  e  to  an  assembly  expression; 
and  [a]au.e,  which  produces  the  assembly  operand  for  the  atom  a  and  substitutes  it  for  the  variable  v 
in  expression  e. 

4.2.1  Atom  translation 

The  translation  of  atoms  is  primarily  a  translation  of  the  IR  names  for  values  and  the  assembly  names 
for  operands.  A  representative  set  of  atom  translations  is  shown  in  Figure  [4j  Since  the  language  is 
untyped,  we  use  a  31-bit  representation  of  integers,  where  the  least-significant- bit  is  always  set  to  1. 
Since  pointers  are  always  word-aligned,  this  allows  the  garbage  collector  to  differentiate  between  integers 
and  pointers.  The  division  operation  is  the  most  complicated  translation:  first  the  operands  a\  and  a2 
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[atom]  [let  v  =  a  in  e[u]]a 
< — >  Hat/. 

MOV  v',  Xv. 

IeMla 

[ifl]  [if  a  then  ei  else  e2]a 
« — >  [a]  a  test. 

CMP  $0,  test 

J{e i]a  then  [e2la  else 

[if2]  [if  a\  op  a2  then  e±  else  e2]a 
< — >  [ailafi. 

la2jaV2- 
CMP  v  \ ,  V2 

Jlop]a  then  [eija  else  [e2Ja 
[sub]  [letv  =  ai[a2]  in  e[u]]a 
4 — >  [aila^i- 

MOV  v\,  Xtuple. 

MOV  V2,  Xindex1 . 

SAR  $1,  %index ' ,  Xindex. 

MOV  —  4(%tuple) ,  Xsize' . 

SAR  $2,  %size' ,  Xsize. 

CMP  size,  index 

JAE  then  bounds. error  else 

MOV  0(%tuple,%index,A),  Xv. 

IeMla 

Figure  5:  Translation  of  expressions  to  x86  assembly 


are  shifted  to  obtain  the  standard  integer  representation,  the  division  operation  is  performed,  and  the 
result  is  converted  to  a  31-bit  representation. 

4.2.2  Expression  translation 

Expressions  translate  to  sequences  of  assembly  instructions.  A  representative  set  of  translations  in 
shown  in  Figure |5j  The  translation  of  let  v  =  a  in  e[v\  is  the  simplest  case,  the  atom  a  is  translated  into 
an  operand  v' ,  which  is  copied  to  a  variable  v  (since  the  expression  e\v\  assumes  v  is  a  variable),  and 
the  rest  of  the  code  e[v]  is  translated.  Conditionals  translate  into  comparisons  followed  by  a  conditional 
branch. 

The  memory  operations  shown  in  Figure  [6]  are  among  the  most  complicated  translations.  For  the 
runtime,  we  use  a  contiguous  heap  and  a  copying  garbage  collector.  The  representation  of  all  memory 
blocks  in  the  heap  includes  a  header  word  containing  the  number  of  bytes  in  the  block  (the  number  of 
bytes  is  always  a  multiple  of  the  word  size),  following  by  one  word  for  each  field.  A  pointer  to  a  block 
points  to  the  first  field  of  the  block  (the  word  after  the  header  word).  The  heap  area  itself  is  contiguous, 
delimited  by  base  and  limit  pointers;  the  next  allocation  point  is  in  the  next  pointer.  These  pointers  are 
accessed  through  the  context  [name]  pseudo-operand,  which  is  later  translated  to  an  absolute  memory 
address. 

During  a  subscript  operation,  shown  in  the  sub  translation,  the  index  is  compared  against  the  number 
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[alloc]  [letu=  (tuple)  in  e[u]]a 
< — >  reserve($  |  tuple  |) 

MOV  context  [next] ,  Xv. 

ADD  $(|  tuple  |  +1)  *4,  context  [next] 

MOV  $  |  tuple  |  *4,  (%v) 

ADD  $4,  %v,  X p. 
store  _tuple(p,  0,  tuple); 

IeMla 

[closure]  [letcu  =  01(02)  in  e[u]]a 
< — >  reserve($3) 

MOV  context  [next] ,  Xv. 

ADD  $12,  context  [next] 

MOV  $8,  (%v) 

Ma^- 

MOV  ui,  4(%v) 

MOV  v2,  8(%v) 

ADD  $4,  %v,  X p. 

[eblla 

[call]  l'a(args)ja 
< — *  \a\aclosure. 

MOV  4 (% closure),  Xenv. 
copy_args((),  args)Xvargs. 

JMP  (% closure) (vargs) 

Figure  6:  Translation  of  memory  operations  to  x86  assembly 


[reserve]  reserve(z);e 
< — >  MOV  context  [limit],  Xlimit. 

SUB  context  [next] ,  %limit,  Xfree. 

CMP  i,  %free 
J b  then  gc(i)  else  e 
[stuplel]  store_tuple(p,  i,  (a  ::  args));e 

* — *  Hav- 

MOV  v,  i(%p) 
store_tuple(p,  i  +  4,  args);  e 
[stuple2]  store  _tuple(p,  i,  ());  e  < — >  e 
[copyl]  copy_args((a  ::  args),  vargs) Xv.e[v] 

< — *  [a]at/. 

MOV  v',  Xv. 

copy args ( args,  (%u  ::  vargs)) Xv.e[v\ 

[copy2]  copy_args((),  vargs)Xv.e[v\  < — >  e [reverse (vargs)] 

Figure  7:  Auxiliary  terms  for  x86  code  generation 
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of  words  in  the  block  as  indicated  in  the  header  word,  and  a  bounds-check  exception  is  raised  if  the  index 
is  out-of-bounds  (denoted  with  the  instruction  J AE  then  bounds. error  else).  When  a  block  of  memory 
is  allocated  in  the  alloc  and  closure  rules,  the  first  step  reserves  storage  with  the  reserve(i)  term, 
and  then  the  data  is  allocated  and  initialized.  Figure  [7]  shows  the  implementation  of  some  of  the  helper 
terms:  the  reserve(i)  expression  determines  whether  sufficient  storage  is  present  for  an  allocation  of  i 
bytes,  and  calls  the  garbage  collector  otherwise;  the  store_tuple(p,  i,  args );  e  term  generates  the  code  to 
initialize  the  fields  of  a  tuple  from  a  set  of  arguments;  and  the  copy _args ( args .  vargs)\v.e  term  copies 
the  argument  list  in  args  into  registers. 

4.3  Register  allocation 

Register  allocation  is  one  of  the  easier  phases  of  the  compiler  formally:  the  main  objective  of  register 
allocation  is  to  rename  the  variables  in  the  program  to  use  register  names.  The  formal  problem  is  just  an 
cc-conversion,  which  can  be  checked  readily  by  the  formal  system.  From  a  practical  standpoint,  however, 
register  allocation  is  a  NP-complete  problem,  and  the  majority  of  the  code  in  our  implementation  is 
devoted  to  a  Chaitin-style  [2]  graph-coloring  register  allocator.  These  kinds  of  allocators  have  been  well- 
studied,  and  we  do  not  discuss  the  details  of  the  allocator  here.  The  overall  structure  of  the  register 
allocator  algorithm  is  as  follows. 

1.  Given  a  program  p,  run  a  register  allocator  R(p). 

2.  If  the  register  allocator  R(p)  was  successful,  it  returns  an  assignment  of  variables  to  register  names; 
a-convert  the  program  using  this  variable  assignment,  and  return  the  result  p' . 

3.  Otherwise,  if  the  register  allocator  R(p)  was  not  successful,  it  returns  a  set  of  variables  to  “spill” 
into  memory.  Rewrite  the  program  to  add  fetch/store  code  for  the  spilled  registers,  generating  a 
new  program  p' ,  and  run  register  allocation  R(p')  on  the  new  program. 

Part  2  is  a  trivial  formal  operation  (the  logical  framework  checks  that  p'  =  p).  The  generation  of 
spill  code  for  part  3  is  not  trivial  however,  as  we  discuss  in  the  following  section. 

4.4  Generation  of  spill  code 

The  generation  of  spill  code  can  affect  the  performance  of  a  program  dramatically,  and  it  is  important 
to  minimize  the  amount  of  memory  traffic.  Suppose  the  register  allocator  was  not  able  to  generate 
a  register  assignment  for  a  program  p,  and  instead  it  determines  that  variable  v  must  be  placed  in 
memory.  We  can  allocate  a  new  global  variable,  say  spil ^  for  this  purpose,  and  replace  all  occurrences 
of  the  variable  with  a  reference  to  the  new  memory  location.  This  can  be  captured  by  rewriting  the 
program  just  after  the  binding  occurrences  of  the  variables  to  be  spilled.  The  following  two  rules  give 
an  example. 


[smov]  MOV  o,  \v.e[v\  < — *  MOV  o,  A  spill  i-elspillj] 

[sinst2]  inst2  o,  or,  \v.e[v\ 

* — >  MOV  or,  \spilli . 

inst2  o,  spilli 

However,  this  kind  of  brute-force  approach  spills  all  of  the  occurrences  of  the  variable,  even  those 
occurrences  that  could  have  been  assigned  to  a  register.  Furthermore,  the  spill  location  spilli  would 
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O,  ■■= 


spill  [v,  s] 
spill  [s] 


Spill  operands 


e  ::  = 


SPILL  or,  As.e[s]  New  spill 

SPILL  os,  \v.e[v]  Get  the  spilled  value 


Figure  8:  Spill  pseudo-operands  and  instructions 


presumably  be  represented  as  the  label  of  a  memory  location,  not  a  variable,  allowing  a  conflicting 
assignment  of  another  variable  to  the  same  spill  location. 

To  address  both  of  these  concerns,  we  treat  spill  locations  as  variables,  and  introduce  scoping  for 
spill  variables.  We  introduce  two  new  pseudo-operands,  and  two  new  instructions,  shown  in  Figure 
|8j  The  instruction  SPILL  or,  As.e[s]  generates  a  new  spill  location  represented  in  the  variable  s,  and 
stores  the  operand  or  in  that  spill  location.  The  operand  spill[u,  s]  represents  the  value  in  spill  location 
s,  and  it  also  specifies  that  the  values  in  spill  location  s  and  in  the  register  v  are  the  same.  The 
operand  spill  [s]  refers  the  the  value  in  spill  location  s.  The  value  in  a  spill  operand  is  retrieved  with 
the  SPILL  os,  Xv.e[v]  and  placed  in  the  variable  v. 

The  actual  generation  of  spill  code  then  proceeds  in  two  main  phases.  Given  a  variable  to  spill,  the 
first  phase  generates  the  code  to  store  the  value  in  a  new  spill  location,  then  adds  copy  instruction  to 
split  the  live  range  of  the  variable  so  that  all  uses  of  the  variable  refer  to  different  freshly-generated 
operands  of  the  form  spill  [u,  s].  For  example,  consider  the  following  code  fragment,  and  suppose  the 
register  allocator  determines  that  the  variable  v  is  to  be  spilled,  because  a  register  cannot  be  assigned 
in  code  segment  2. 


AND  o,  or,  Xv. 

...code  segment  1... 

ADD  %v,  o 
...code  segment  2... 

SUB  %v,  o 
...code  segment  3... 

OR  %v,  o 

The  first  phase  rewrites  the  code  as  follows.  The  initial  occurrence  of  the  variable  is  spilled  into 
a  new  spill  location  s.  The  value  is  fetched  just  before  each  use  of  the  variable,  and  copied  to  a  new 
register.  Note  that  the  later  uses  refer  to  the  new  registers,  creating  a  copying  daisy-chain,  but  the 
registers  have  not  been  actually  eliminated. 

AND  o,  or,  Xv\ . 

SPILL  %vi,  As. 

...code  segment  1... 

SPILL  spill  [ui,  s],  Xv2- 
ADD  %t>2 ,  o 
...code  segment  2... 

SPILL  spill  [v2,  s],  At>3. 

SUB  %v3,  o 
...code  segment  3... 

SPILL  spill [r>3,  s],  Xv^. 

OR  %v,  o 
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Once  the  live  range  is  split,  the  register  allocator  has  the  freedom  to  spill  only  part  of  the  live  range. 
During  the  second  phase  of  spilling,  the  allocator  will  determine  that  register  V2  must  be  spilled  in  code 
segment  2,  and  the  spill [v2,  s]  operand  is  replaced  with  spill [s]  forcing  the  fetch  from  memory,  not 
the  register  V2 ■  Register  V2  is  no  longer  live  in  code  segment  2,  easing  the  allocation  task  without  also 
spilling  the  register  in  code  segments  1  and  3. 

4.5  Formalizing  spill  code  generation 

The  formalization  of  spill  code  generation  can  be  performed  in  three  parts.  The  first  part  generates 
new  spill  locations  (line  2  in  the  code  sequence  above);  the  second  part  generates  live-range  splitting 
code  (lines  4,  7,  and  10);  and  the  third  part  replaces  operands  of  the  form  spill[u,  s]  with  spill[s]  when 
requested  by  the  garbage  collector. 

The  first  part  requires  a  rewrite  for  each  kind  of  instruction  that  contains  a  binding  occurrence  of 
a  variable.  The  following  two  rewrites  are  representative  examples.  Note  that  all  occurrences  of  the 
variable  v  are  replaced  with  spill  [v,  s],  potentially  generating  operands  like  i(%spill[u,  s]).  These  kinds 
of  operands  are  rewritten  at  the  end  of  spill-code  generation  to  their  original  form,  e.g.  i(%v). 

[srnov]  MOV  or,  Xv.e[v] 

< — >  MOV  or,  Xv. 

SPILL  %v,  As. 
e[spill[u,  s]] 

[sinst2]  inst2  o,  or,  Xv.e[v\ 

< — >  inst2  o,  or,  Xv.e[v\ 

SPILL  %v,  As. 
e[spill[u,  s]] 

The  second  rewrite  splits  a  live  range  of  a  spill  at  an  arbitrary  point.  This  rewrite  applies  to  any 
program  that  contains  an  occurrence  of  an  operand  spill [ui,  s],  and  translates  it  to  a  new  program  that 
fetches  the  spill  into  a  new  register  V2  and  uses  the  new  spill  operand  spill  [v2,  s]  in  the  remainder  of 
the  program.  This  rewrite  is  selectively  applied  before  any  instruction  that  uses  an  operand  spill  [t>i,  s]. 

[split]  e[spill[ui,  s]] 

< — >  SPILL  spill[ui,  s],  Au2.e[spill[u2,  s]] 

In  the  third  and  final  phase,  when  the  register  allocator  determines  that  a  variable  should  be  spilled, 
the  spill  [v,  s]  operands  are  selectively  eliminated  with  the  following  rewrite. 

[spill]  spill  [u,  s]  * — >  spill  [s] 


4.6  Assembly  optimization 

There  are  several  simple  optimizations  that  can  be  performed  on  the  generated  assembly,  including 
dead-code  elimination  and  reserve  coalescing.  Dead-code  elimination  has  a  simple  specification:  any 
instruction  that  defines  a  new  binding  variable  can  be  eliminated  if  the  variable  is  never  used.  The 
following  rewrites  capture  this  property. 


[drnov] 

MOV  o, 

Xv.e  < — > 

[dinstl] 

instl  or, 

,  Xv.e  < — > 

[dinst2] 

inst2  o, 

or,  Xv.e  <- 

[dinst3] 

instS  o, 

On  ,  Or2  ,  X 
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As  we  mentioned  in  Section  3.4,  this  kind  of  dead-code  elimination  should  not  be  applied  if  the 
instruction  being  eliminated  can  raise  an  exception. 

Another  useful  optimization  is  the  coalescing  of  reserve(z)  instructions,  which  call  the  garbage 
collector  if  i  bytes  of  storage  are  not  available.  In  the  current  version  of  the  language,  all  reservations 
specify  a  constant  number  of  bytes  of  storage,  and  these  reservations  can  be  propagated  up  the  expression 
tree  and  coalesced.  The  first  step  is  an  upward  propagation  of  the  reserve  statement.  The  following 
rewrites  illustrate  the  process. 


[rrnov]  MOV  o,  Au.reserve(z);  e\v\ 

< — ¥  reserve(z);  MOV  o,  \v.e[v\ 

[rinst2]  inst2  o,  or,  Au.reserve(z);  e[v\ 

< — ¥  reser ve(i);inst2  o ,  or,  \v.e[v\ 

Adjacent  reservations  can  also  be  coalesced. 

[rres]  reserve(zi);  reserve^);  e  < — >  reserve(zi  +  12)',  e 

Two  reservations  at  a  conditional  boundary  can  also  be  coalesced.  To  ensure  that  both  branches 
have  a  reserve,  it  is  always  legal  to  introduce  a  reservation  for  0  bytes  of  storage. 

[rif]  J  cc  then  reserve(zi);  ei  else  reserve^);  e2 
< — ¥  reserve  ( max  (zi;  *2));  J  cc  then  e\  else  e2 

[rzero]  e  < — >  reserve(O);  e 

5  Summary  and  Future  Work 

One  of  the  points  we  have  stressed  in  this  presentation  is  that  the  implementation  of  formal  compilers  is 
easy,  perhaps  easier  than  traditional  compiler  development  using  a  general-purpose  language.  This  case 
study  presents  a  convincing  argument  based  on  the  authors’  previous  experience  implementing  compilers 
using  traditional  methods.  The  formal  process  was  easier  to  specify  and  implement,  and  MetaPRL 
provided  a  great  deal  of  automation  for  frequently  occurring  tasks.  In  most  cases,  the  implementation 
of  a  new  compiler  phase  meant  only  the  development  of  new  rewrite  rules.  There  is  very  little  of  the 
“grunge”  code  that  plagues  traditional  implementations,  such  as  the  maintenance  of  tables  that  keep 
track  of  the  variables  in  scope,  code-walking  procedures  to  apply  a  transformation  to  the  program’s 
subterms,  and  other  kinds  of  housekeeping  code. 

As  a  basis  of  comparison,  we  can  compare  the  formal  compiler  in  this  paper  to  a  similar  native-code 
compiler  for  a  fragment  of  the  Java  language  we  developed  as  part  of  the  Mojave  project  [7].  The  Java 
compiler  is  written  in  OCarnl,  and  uses  an  intermediate  representation  similar  to  the  one  presented 
in  this  paper,  with  two  main  differences:  the  Java  intermediate  representation  is  typed,  and  the  x86 
assembly  language  is  not  scoped. 

Figure  [9]  gives  a  comparison  of  some  of  the  key  parts  of  both  compilers  in  terms  of  lines  of  code, 
where  we  omit  code  that  implements  the  Java  type  system  and  class  constructs.  The  formal  compiler 
columns  list  the  total  lines  of  code  for  the  term  rewrites,  as  well  as  the  total  code  including  rewrite 
strategies.  The  size  of  the  total  code  base  in  the  formal  compiler  is  still  quite  large  due  to  the  extensive 
code  needed  to  implemented  the  graph  coloring  algorithm  for  the  register  allocator.  Preliminary  tests 
suggest  that  performance  of  programs  generated  from  the  formal  compiler  is  comparable,  sometimes 
better  than,  the  Java  compiler  due  to  a  better  spilling  strategy. 

The  work  presented  in  this  paper  took  roughly  one  person-week  of  effort  from  concept  to  imple¬ 
mentation,  while  the  Java  implementation  took  roughly  three  times  as  long.  It  should  be  noted  that, 
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Description 

Formal  compiler 
Rewrites  Total 

Java 

CPS  conversion 

44 

347 

338 

Closure  conversion 

54 

410 

1076 

Code  generation 

214 

648 

1012 

Total  code  base 

484 

10000 

12000 

Figure  9:  Code  comparison 


while  the  Java  compiler  has  been  stable  for  about  a  year,  it  still  undergoes  periodic  debugging.  Register 
allocation  is  especially  problematic  to  debug  in  the  Java  compiler,  since  errors  are  not  caught  at  compile 
time,  but  typically  cause  memory  faults  in  the  generated  program. 

This  work  is  far  from  complete.  The  current  example  serves  as  a  proof  of  concept,  but  it  remains 
to  be  seen  what  issues  will  arise  when  the  formal  compilation  methodology  is  applied  to  more  complex 
programming  languages.  For  future  work,  we  intend  to  approach  the  problem  of  developing  and  validat¬ 
ing  formal  compilers  in  three  steps.  The  first  step  is  the  development  of  typed  intermediate  languages. 
These  languages  admit  a  broader  class  of  rewrite  transformations  that  are  conditioned  on  well-typed 
programs,  and  the  typed  language  serves  as  a  launching  point  for  compiler  validation.  The  second  step 
is  to  develop  a  semantics  of  the  intermediate  language  and  validate  the  rewrite  rules  for  a  small  source 
language  similar  to  the  one  presented  here.  It  is  not  clear  whether  the  same  properties  should  be  applied 
to  the  assembly  language — whether  the  assembly  language  should  be  typed,  and  whether  it  is  feasible  to 
develop  a  simple  formal  model  of  the  target  architecture  that  will  allow  the  code  generation  and  register 
allocations  phases  to  be  verified.  The  final  step  is  to  extend  the  source  language  to  one  resembling  a 
modern  general-purpose  language. 

6  Related  work 

The  use  of  higher-order  abstract  syntax,  logical  environments,  and  term  rewriting  for  compiler  imple¬ 
mentation  and  validation  are  not  new  areas  individually. 

Term  rewriting  has  been  successfully  used  to  describe  programming  language  syntax  and  semantics, 
and  there  are  systems  that  provide  efficient  term  representations  of  programs  as  well  as  rewrite  rules  for 
expressing  program  transformations.  For  instance,  the  ASF+SDF  environment  rb  allows  the  programmer 
to  construct  the  term  representation  of  a  wide  variety  of  programming  syntax  and  to  specify  equations 
as  rewrite  rules.  These  rewrites  may  be  conditional  or  unconditional,  and  are  applied  until  a  normal 
form  is  reached.  Using  equations,  programmers  can  specify  optimizations,  program  transformations, 
and  evaluation.  The  ASF+SDF  system  targets  the  generation  of  informal  rewriting  code  that  can  be  used 
in  a  compiler  implementation. 

Liang  |10|  implemented  a  compiler  for  a  simple  imperative  language  using  a  higher-order  abstract 
syntax  implementation  in  AProlog.  Liang’s  approach  includes  several  of  the  phases  we  describe  here, 
including  parsing,  CPS  conversion,  and  code  generation  using  a  instruction  set  defined  using  higher- 
abstract  syntax  (although  in  Liang’s  case,  registers  are  referred  to  indirectly  through  a  meta-level  store, 
and  we  represent  registers  directly  as  variables).  Liang  does  not  address  the  issue  of  validation  in 
this  work,  and  the  primary  role  of  AProlog  is  to  simplify  the  compiler  implementation.  In  contrast  to 
our  approach,  in  Liang’s  work  the  entire  compiler  was  implemented  in  AProlog,  even  the  parts  of  the 
compiler  where  implementation  in  a  more  traditional  language  might  have  been  more  convenient  (such 
as  register  allocation  code). 
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FreshML  m  adds  to  the  ML  language  support  for  straightforward  encoding  of  variable  bindings  and 
alpha-equivalence  classes.  Our  approach  differs  in  several  important  ways.  Substitution  and  testing  for 
free  occurrences  of  variables  are  explicit  operations  in  FreshML,  while  MetaPRL  provides  a  convenient 
implicit  syntax  for  these  operations.  Binding  names  in  FreshML  are  inaccessible,  while  only  the  formal 
parts  of  MetaPRL  are  prohibited  from  accessing  the  names.  Informal  portions — such  as  code  to  print 
debugging  messages  to  the  compiler  writer,  or  warning  and  error  messages  to  the  compiler  user — can 
access  the  binding  names,  which  aids  development  and  debugging.  FreshML  is  primarily  an  effort  to 
add  automation;  it  does  not  address  the  issue  of  validation  directly. 

Previous  work  has  also  focused  on  augmenting  compilers  with  formal  tools.  Instead  of  trying  to 
split  the  compiler  into  a  formal  part  and  a  heuristic  part,  one  can  attempt  to  treat  the  whole  compiler 
as  a  heuristic  adding  some  external  code  that  would  watch  over  what  the  compiler  is  doing  and  try  to 
establish  the  equivalence  of  the  intermediate  and  final  results.  For  example,  the  work  of  Necula  and  Lee 
mm  has  led  to  effective  mechanisms  for  certifying  the  output  of  compilers  (e.g.,  with  respect  to  type 
and  memory-access  safety),  and  for  verifying  that  intermediate  transformations  on  the  code  preserve 
its  semantics.  While  these  approaches  certify  the  code  and  ease  the  debugging  process  (errors  can  be 
flagged  at  compile  time  rather  than  at  run-time),  it  is  not  clear  that  they  simplify  the  implementation 
task. 

There  have  been  efforts  to  present  more  functional  accounts  of  assembly  as  well.  Morrisett  et.  al. 
(12)  developed  a  typed  assembly  language  capable  capable  of  supporting  many  high-level  programming 
constructs  and  proof  carrying  code.  In  this  scheme,  well-typed  assembly  programs  cannot  “go  wrong.” 
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7  Source  listing 


The  following  sections  provide  the  programming  documentation  generated  by  MetaPRL  from  the  source 
code. 


8  M  ir  module 


This  module  defines  the  intermediate  language  for  the  M  language.  Here  is  the  abstract  syntax: 


(*  Values  *) 
v  :  :=  i 
I  b 
I  v 

I  fun  v  ->  e 
I  (vl,  v2) 


(integers) 

(booleans) 

(variables) 

(functions) 

(pairs) 


(*  Atoms  (functional  expressions)  *) 
a  : :=  i  (integers) 

I  b  (booleans) 

I  v  (variables) 

I  al  op  a2  (binary  operation) 

I  fun  x  ->  e  (unnamed  functions) 


(*  Expressions  *) 
e  : :=  let  v  =  a  in  e 
I  f  (a) 

I  if  a  then  el  else  e2 
I  let  v  =  al  [a2]  in  e 
I  al [a2]  <-  a3;  e 


(LetAtom) 

(TailCall) 

(Conditional) 

(Subscripting) 

(Assignment) 


(*  These  are  eliminated  during  CPS  *) 

I  let  v  =  f(a)  in  e  (Function  application) 

I  return  a 


A  program  is  a  set  of  function  definitions  and  an  program  expressed  in  a  sequent.  Each  function 
must  be  declared,  and  defined  separately. 


8.1  Parents 
extends  M_util 


8.2  Terms 

Binary  operators. 
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declare  M_ir!AddOp  (displayed  as  M  JrlAddOp) 
declare  M_ir!  SubOp  (displayed  as  M _ir\ SubOp) 
declare  M_ir!MulOp  (displayed  as  MJrlMulOp) 
declare  M_ir!DivOp  (displayed  as  M  JrlDivOp) 
declare  M_ir!LtOp  (displayed  as  MJrlLtOp) 
declare  M_ir!LeOp  (displayed  as  M _ir\LeOp) 
declare  M_ir!EqOp  (displayed  as  M _ir\EqOp) 
declare  M_ir!NeqOp  (displayed  as  M JrlNeqOp) 
declare  M_ir!GeOp  (displayed  as  M  JrlGeOp) 
declare  M_ir!GtOp  (displayed  as  MJrlGtOp) 


8.2.1  Atoms 

Atoms  are  values:  integers,  variables,  binary  operations  on  atoms,  and  functions. 

AtomFun  is  a  lambda- abstraction,  and  AtomFunVar  is  the  projection  of  a  function  from  a  recursive 
function  definition  (defined  below). 

declare  M_ir ! AtomFalse  (displayed  as  false ) 

declare  M_ir !  AtomTrue  (displayed  as  true) 

declare  M_ir !  Atomlnt  [i  :n]  (displayed  as  ffi) 

declare  M_ir !  AtomBinop}  ’  op ;  ’al;  ’a2}  (displayed  as  a\  op  02) 

declare  M_ir!AtomRelop{’op;  ’al;  ’a2}  (displayed  as  a\  op  02) 

declare  M_ir ! AtomFunjx.  e[’x]}  (displayed  as  \ax.  e[x]) 

declare  M_ir !  AtomVar} 1  v}  (displayed  as  j  v ) 

declare  M.irlAtomFunVarj'R;  ’v}  (displayed  as  R.v) 


8.2.2  Expressions 

There  are  several  simple  kinds  of  expressions. 

declare 

M_ir ! LetAtomj ’ a;  v.  e[’v]}  (displayed  as  let  v  =  a  in  e[v\) 
declare  M.irllfj'a;  ’el;  ’e2}  (displayed  as  if  a  then  ei  else  ef) 
declare  M_ir!ArgNil  (displayed  as  ) 

declare  M_ir ! ArgConsj ’ a;  ’rest}  (displayed  as  a  ::  rest) 
declare  M_ir!TailCall{’f ;  ’args}  (displayed  as  tailcall  /  args) 
declare  M_ir !  Length  [i  :n]  (displayed  as  i) 
declare  M_ir !  AllocTupleNil  (displayed  as  ()) 

declare  M_ir !  AllocTupleConsj  ’ a;  ’rest}  (displayed  as  a  ::  rest) 
declare 

M_ir ! LetTuplej ’ length;  ’tuple;  v.  e[’v]} 

(displayed  as  let  v  =  [length  =  length]  tuple  in  e[v]) 
declare 

M_ir !LetSubscript{’al;  ’a2;  v.  e[’v]} 

(displayed  as  let  v  =  ai  [02]  in  e[v\) 
declare 

M_ir ! SetSubscriptj ’al ;  ’a2;  ’a3;  ’e} 
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(displayed  as  01(02] 


03;  e) 


Reserve  statements  are  inserted  later  in  each  function  header. 

declare 

M_ir ! Reserve [words : n] { ’ e} 

(displayed  as  reserve  words  words  in  e) 
declare 

M_ir ! Reserve [words :n] {’ args ;  ’e} 

(displayed  as  reserve  words  words  args  args  in  e) 
declare 

M_ir ! ReserveCons} ’ a;  ’rest} 

(displayed  as  MJr\ReserveCons{a ;  rest}) 

declare  M  ir !  ReserveNil  (displayed  as  ) 


LetApply,  Return  are  eliminated  during  CPS  conversion.  LetClosure  is  like  LetApply,  but  it  is  a 
partial  application. 

declare 

M_ir ! Let Apply} ’ f ;  ’a;  v.  e[’v]} 

(displayed  as  let  apply  v  =  /(a)  in  e[v\) 
declare 

M_ir ! LetClosure} ’ al ;  ’a2;  f.  e[’f]} 

(displayed  as  let  closure  /  =  01(02)  in  e[/]) 
declare  M_ir !  Return}  ’  a}  (displayed  as  return(a)) 


8.2.3  Recursive  values 

We  need  some  way  to  represent  mutually  recursive  functions.  The  normal  way  to  do  this  is  to  define  a 
single  recursive  function,  and  use  a  switch  to  split  the  different  parts.  The  method  to  do  this  would  use 
a  record.  For  example,  suppose  we  define  two  mutually  recursive  functions  /  and  g: 

let  r2  =  fixfrl.  recordf 

field["f "] {lambdafx.  (rl .g) (x)}}; 
f ield["g"] {lambdafx .  (rl . f ) (x) }}}} 
in 

r2.f (1) 


declare 

M_ir!LetRec}Rl.  el [’Rl] ;  R2.  e2[’R2]} 
(displayed  as  let  rec  R\ .  ei[i?i]  i?2-in  62(^2]) 


Records  have  a  set  of  tagged  fields.  We  require  that  all  the  fields  be  functions. 

The  record  construction  is  recursive.  The  Label  term  is  used  for  field  tags;  the  FunDef  defines  a 
new  field  in  the  record;  and  the  EndDef  term  terminates  the  record  fields. 
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declare  M_ir ! Fields} ’ fields}  (displayed  as  {  fields  }) 
declare  M_ir ! Label  [tag: t]  (displayed  as  ” tag ”) 
declare 

M_ir ! FunDef} ’ label ;  ’exp;  ’rest} 

(displayed  as  fun  label  =  exp  rest ) 
declare  M_ir!EndDef  (displayed  as  ) 


To  simplify  the  presentation,  we  usually  project  the  record  fields  before  each  of  the  field  branches  so 
that  we  can  treat  functions  as  if  they  were  variables. 

declare 

M_ir ! LetFun{ ’R;  ’label;  f.  e[’f]} 

(displayed  as  let  fun  /  =  R.label  in  e [/] ) 


Include  a  term  representing  initialization  code. 

declare  M_ir !  Initialize}  ’  e}  (displayed  as  initialization  e  end) 

8.2.4  Program  sequent  representation 

Programs  are  represented  as  sequents:  ( declarations } ;  (definitions)  h  exp 

For  now  the  language  is  untyped,  so  each  declaration  has  the  form  v :  exp.  A  definition  is  an  equality 
judgment. 

declare  M_ir !  exp  (displayed  as  exp) 

declare  M_ir!def{’v;  ’e}  (displayed  as  v  =  e) 

declare  M_ir !  compilable}  ’  e}  (displayed  as  compilable  e  end) 

Some  convenient  keywords  (used  in  only  display  forms  and  do  not  have  a  formal  meaning). 

declare  M_ir!xlet  (displayed  as  let) 
declare  M_ir!xin  (displayed  as  in) 


Sequent  tag  for  the  M  language. 

declare  M_ir!m  (displayed  as  m) 


8.2.5  Subscripting. 

Tuples  are  listed  in  reverse  order. 

declare  M_ir !  alloc.tuple}  ’  11 ;  ’12}  (displayed  as  4  ::  If) 
declare  M_ir ! allocvtuple} ’ 1}  (displayed  as  M J,r\allocJtuple{l}) 
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9  M  cps  module 

CPS  conversion  for  the  M  language. 


9.1  Parents 

extends  M_ir 
extends  M_util 


9.2  Resources 
The  cps_resource 

The  cps  resource  provides  a  generic  method  for  defining  CPS  transformation.  The  cpsC  conversion 
can  be  used  to  apply  this  evaluator. 

The  implementation  of  the  cps_resource  and  the  cpsC  conversion  rely  on  tables  to  store  the  shape 
of  redices,  together  with  the  conversions  for  the  reduction. 


9.2.1  Application 

Add  an  application  that  we  will  map  through  the  program.  This  should  be  eliminated  by  the  end  of 
CPS  conversion. 

•  CPSRecordVar[l?]  represents  the  application  of  the  record  R  to  the  identity  function. 

•  CPSFunVar[/]  represents  the  application  of  the  function  /  to  the  identity  function. 

•  CPS  [contj  e]  is  the  CPS  conversion  of  expression  e  with  continuation  cont.  The  interpretation  is 
as  the  application  cont  e. 

•  CPS  [cont.  fields[cont ]]  is  the  CPS  conversion  of  a  record  body.  We  think  of  a  record  {/i  = 
ei; ...;  fn  =  en}  as  a  function  from  labels  to  expressions  (on  label  ft,  the  function  returns  ef).  The 
CPS  form  is  AZ.Ac.CPS[c;  ftelds[l]\. 

•  CPS[a]  is  the  conversion  of  the  atom  expression  a  (which  should  be  the  same  as  a,  unless  a 
includes  function  variables). 

declare  M_cps ! CPSRecordVarj ’R}  (displayed  as  CPSRecordVar[i?]) 
declare  M_cps  !  CPSFunVarj  ’  f }  (displayed  as  CPSFunVar[/]) 
declare  M_cps  !  CPS{  ’  cont ;  ’  e}  (displayed  as  CPS  [cont;  e]) 
declare 

M_cps ! CPSjcont .  fields [’ cont] } 

(displayed  as  CPS[conf.  fields[cont ]]) 
declare  M_cps  !  CPS{ ’a}  (displayed  as  CPS  [a]) 
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9.2.2  Formalizing  CPS  conversion 


CPS  conversion  work  by  transformation  of  function  application.  Each  rewrite  in  the  transformation 
preserves  the  operational  semantics  of  the  program. 

For  atoms,  the  transformation  is  a  no-op  unless  the  atom  is  a  function  variable.  If  so,  the  function 
must  be  partially  applied. 

![]  rewrite  cps_atom_true  {|  cps  |}  :  CPS[trrte]  < - >  true 

![]  rewrite  cps_atom_f alse  {|  cps  |}  :  CPS[/a^se]  « — >  false 

![]  rewrite  cps_atom_int  {|  cps  |}  :  CPS[#i]  < > 

![]  rewrite  cps_atom_var  {|  cps  |}  :  CPS[  j  u]  « - >  |  v 

![]  rewrite  cps_atom_binop  {|  cps  |}  : 

CPS[(oi  op  02)]  < — ♦  (CPS[oi]  op  CPS[a2]) 

![]  rewrite  cps_atom_relop  {|  cps  |}  : 

CPS [(01  op  02)]  < — ♦  (CPS [01]  op  CPS[a2]) 

![]  rewrite  cps_fun_var  {|  cps  |}  :  CPS[CPSFunVar[/]]  * - >  j  / 

![]  rewrite  cps_alloc_tuple_nil  {|  cps  |}  :  CPS[()]  < - >  () 

![]  rewrite  cps_alloc_tuple_cons  {|  cps  |}  : 

CPS[a  ::  rest]  < - *  CPS[o]  ::  CPS  [rest] 

![]  rewrite  cps_arg_cons  {|  cps  |}  : 

CPS[(a  ::  rest)]  * — *  (CPS[o]  ::  CPS  [rest]) 

![]  rewrite  cps_arg_nil  {|  cps  |}  :  CPS[()]  « - >  () 

![]  rewrite  cps.length  {|  cps  |}  :  CPS[i]  < - »  i 


CPS  transformation  for  expressions. 

![]  rewrite  cps_let_atom  {|  cps  |}  : 

CPS[  cont\  (let  v  —  a  in  e[u])]  < — * 

(let  v  =  CPS[a]  in  CPS[cont;  e[u]]) 

![]  rewrite  cps_let_tuple  {|  cps  |}  : 

CPS[conf;  (let  v  =[length  =  length]  tuple  in  e[u])]  < — > 

(let  v  =  [length  =  CPS  [length]]  CPS  [tuple]  in 
CPS[conf;  e[u]]) 

![]  rewrite  cps_let_subscript  {|  cps  |}  : 

CPS[  cont\  (let  v  =  ai[a2]  in  e[u])]  < — > 

(let  v  =  CPS[ai] [CPS[a2]]  in  CPS[cont;  e[u]]) 

![]  rewrite  cps_if  {|  cps  |}  : 

CPS [cont\  (if  a  then  e\  else  e2)]  < — > 

(if  CPS  [a]  then  CPS  [cont\  e\]  else  CPS  [cont\  e2]) 

![]  rewrite  cps_let_apply  {|  cps  |}  : 

CPS[conf;  (let  apply  v  =  CPSFunVar[/](a2)  in  e[u])]  < — ■> 
(let  rec  R.  fun  ”5”  =  ( \av .  CPS[conf;  e[u]]) 

R.  in 

let  fun  g  =  R." g”  in  tailcall  J,  /  (|  g,  CPS[a2])) 


Converting  functions  is  the  hard  part. 

![]  rewrite  cps_let_rec  {|  cps  |}  : 
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CPS[  cont;  (let  rec  R\ .  fields[R\]  i?2.in  e [-R2] )]  * — > 

(let  rec  R,\ . 

CPS[cont.  CPS[conf;  /ieMs[CPSRecordVar[f?i]]]] 
R-2.in 

CPS[conf;  e[CPSRecordVar[i?2]]]) 

![]  rewrite  cps_fields  {|  cps  |}  : 

CPS[  cont.  CPS[conf;  ({  fields[cont\  })]]  <■ — ■> 

({  CPS[cont.  CPS[conf;  fields[cont]\\  }) 

![]  rewrite  cps_fun_def  {|  cps  |}  : 

CPS[  cont.  CPS[conf;  (fun  label  =  ( Xav .  e[u])  rest)]]  «■ — > 

(fun  label  =  (A acont.  \av.  CPS[cont;  e[r]]) 

CPS  [cont.  CPS  [cont\  rest]]) 

![]  rewrite  cps_end_def  {|  cps  |}  :  CPS[conf.  CPS[conf;  ]]  < — > 

![]  rewrite  cps.initialize  {|  cps  |}  : 

CPS[  cont ;  (initialization  e  end)]  < — > 

(initialization  CPS  [cont\  e ]  end) 

![]  rewrite  cps_let_fun  {|  cps  |}  : 

CPS[conf;  (let  fun  /  =  CPSRecordVar  [f?].Za&eZ  in  e [/] ) ]  « — ► 
(let  fun  /  =  R.label  in  CPS[conf;  efCPSFunVar]/]]] ) 

![]  rewrite  cps  .return  {|  cps  |}  : 

CPS[conf;  return(a)]  * — ►  (tailcall  j  cont  (CPS[a])) 

![]  rewrite  cps.tailcall  {|  cps  |}  : 

CPS[  cont;  (tailcall  CPSFunVar]/]  args )]  < — ■> 

(tailcall  |/(|  cont  ::  CPS[args])) 

![]  rewrite  cps_fun_var_cleanup  {|  cps  |}  : 

|  CPSFunVar]/]  < — *  CPSFunVar]/] 


The  program  is  compilable  if  the  CPS  version  is  compilable. 

#  rule  cps.prog  : 

rri 

1.  (r) 

2.  cont  :  exp 

h 

compilable 

let  rec  R.  fun  ”  .init”  =  ( Xacont .  CPS[conf;  e]) 
R.  in 

let  fun  init  =  R.”  .init”  in 

initialization  tailcall  j  init  (j  cont)  end 

end  — > 

[m]  (r)  h  compilable  e  end 


10  M  closure  module 

Closure  conversion  for  the  M  language.  The  program  is  closed  in  four  steps. 
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1.  In  the  first  step,  all  LetRec  terms  are  replaced  with  CloseRec  terms,  which  include  an  extra  frame 
parameter.  The  frame  is  allocated  as  a  tuple,  and  and  the  free  variables  are  projected  from  the 
tuple. 

2.  In  the  second  step,  for  each  CloseRec  that  has  a  free  variable,  the  free  variable  is  added  to  the 
frame,  and  projected  within  the  record. 

3.  In  the  third  step,  the  CloseRec  is  converted  back  to  a  LetRec  followed  by  a  tuple  allocation. 

4.  The  fourth  phase  moves  code  around  an  generally  cleans  up. 


10.1  Parents 

extends  M_ir 


10.2  Resources 
The  closure.resource 


10.2.1  Terms 


a  in 


e  [u] .  We  use  a  special  term  for  variables 


We  define  several  auxiliary  terms. 

The  close  v  =  a  in  e[v]  term  is  the  same  as  let  v 
that  are  being  closed. 

The  R[frame ]  term  is  used  to  wrap  record  variables.  The  term  represents  the  partial  application  of 
the  record  R  to  the  frame  variable. 

The  close  rec  R\,framei .  fields[R\\  frame i ]  R2,  frame2.  tuple  of  length  length  hody[R2]  frame2] 
is  a  recursive  record  definition.  The  function  defined  by  the  fields[R\ ;  frame  1}  takes  the  /rame1  as 
an  extra  argument;  /rame1  represents  the  environment  containing  all  the  functions’  free  variables.  The 
body[R2',  frame2\  is  the  rest  of  the  program.  The  frame2  represents  the  frame  to  be  used  for  the  functions 
in  R2.  The  frame2  is  allocated  as  the  tuple,  which  has  “ length ”  fields. 

close  v  =  01(02]  in  e[v]  is  the  same  as  LetSubscript  but  we  use  a  special  term  to  guide  the 
closure  conversion  process. 

A q frame.  e[frame\  is  the  term  that  adds  an  extra  frame  argument  to  each  of  the  functions  in  the 
record. 


declare 

M.closure ! CloseVarjv .  e[’v];  ’a} 

(displayed  as  close  v  =  a  in  e\v\) 

declare  M.closure ! CloseRecVar} ’R;  ’frame}  (displayed  as  R\frame ]) 
declare 

M.closure ! CloseRec 

{Rl,  f ramel .  fields[’Rl;  ’framel] ; 

R2,  frame2.  body[’R2;  ’frame2] ; 

’ length; 

’tuple} 

(displayed  as 
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close  rec  Ri,  frame i-  fields[R\\  framed 
R-2,  frame2- 

tuple  of  length  length 
body[R-2]  frame2 ]) 

declare 

M.closure ! CloseSubscript{ ’ frame ;  ’index;  v.  e[’v]} 
(displayed  as  close  v  =  frame  [index]  in  e[v]) 
declare 

M.closure ! CloseFramejf rame .  e [ ’frame] } 

(displayed  as  A qframe.  e [frame]) 


10.2.2  Phase  1 

Convert  all  LetRec  to  CloseRec  so  that  each  function  will  have  a  frame  variable  for  its  free  variables. 

![]  rewrite  add_frame  : 

(let  rec  R\.  fields[R\ ] 
f?2in 
e[R2})  < — ■> 

(close  rec  R,\ .  frame.  fields[R\ [frame]] 

R2,  frame. 

()  of  length  0 
e[f?2  [frame]]) 


10.2.3  Phase  2 

In  the  first  phase,  we  abstract  free  variables  using  inverse  beta-reduction.  That  is,  suppose  we  have  a 
recursive  definition: 

close  rec  R,  frame, 
fun  f_l  =  e_l 

fun  f_n  =  e_n 
in  ... 


and  suppose  that  one  of  the  function  bodies  e*  has  a  free  variable  v. 
variable: 


Then  we  first  abstract  the 


CloseVar-fv.  close  rec  .  .  .  ;  v} 


Next,  we  apply  the  closeJrame  rewrite,  which  takes  the  free  variable,  adds  it  to  the  frame,  and 
projects  it  in  the  record. 

close  var  v  =  a  in 
close  rec  R,  frame. 
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fun  f_l  =  e_l 
fun  f_n  =  e_n 

R,  frame  (length  =  i) (args)  in 


close  rec  R,  frame, 
let  v  =  frame [i]  in 
fun  f_l  =  e_l 

fun  f_n  =  e_n 

R,  frame  (length  =  i  +  1) (v  : :  args)  in 


Variable  closure  is  a  beta-rewrite. 

![]  rewrite  reduceibeta  :  (close  v  =  a  in  e[v])  < - >  e[a\ 


This  is  the  main  function  to  lift  out  free  variables. 

declare  M_closure  !  Lengthj  ’  length}  (displayed  as  Length(length)) 

![]  rewrite  wrap_length  :  Length(im )  < - >  i 

![]  rewrite  close_frame  : 

(close  v  =  a  in 

close  rec  Ri, frame i-  fields[v,  R\\  frame J 
R2,  frame2. 
tuple  of  length  i 
body[v,  R2]  frame2 ])  * — > 

(close  rec  R,\ .  frame  l . 

close  v  =  i  frame  { [ifi]  in  fields[v,  R,\ ;  framef\ 

R2,  frame2- 

l  a  ::  tuple  of  length  Length{i  +m 

1) 

let  v  =  |  a  in  body[v,  R 2;  frame2 ]) 


Now,  a  conversional  to  apply  the  inverse-beta  reduction.  The  vars  parameter  is  the  set  of  function 
variables.  Function  variables  are  not  treated  as  free;  we  don’t  need  closure  conversion  for  them. 


10.2.4  Phase  3 

Convert  the  CloseRec  term  to  a  LetRec  plus  a  frame  allocation. 

![]  rewrite  close_close_rec  : 

(close  rec  R\,frame1.  fields[R\ ;  frame  1] 
f?2,  frame2. 
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tuple  of  length  length 
body[R-2',  frame2 ])  « — * 

(let  rec  R\ .  (. \qframe l.  fields[R\;  framed) 

R2.in 

let  frame2  =  [length  =  length]  tuple  in  body[R2 ;  frame2]) 


10.2.5  Phase  4 

Generally  clean  up  and  move  code  around. 

![]  rewrite  close_fields  : 

(close  v  =  a\  [o2]  in  {  fields[v\  })  < — > 

({  close  v  =  01(02]  in  fields[v]  }) 

![]  rewrite  close_fundef  : 

(close  v\  =  01(02]  in  fun  label  =  (XaV2-  e[i>i;  V2])  rest[yi])  < — > 
(fun  label  =  ( \aV2 •  let  v\  =  01(02]  in  e[v\\  vfi[) 
close  vi  =  01(02]  in  rest[v  1]) 

![]  rewrite  close_enddef  :  (close  v  =  ai[a2]  in  )  < - > 

![]  rewrite  close_f rame_f  ields  : 

(A  qfranne.  {  fields[frame\  })  < — >  ({  (A  qframe.  fields  [frame] )  }) 

![]  rewrite  close_f rame_fundef  : 

(A qframe.  fun  label  =  ( \av .  e[frame ;  u])  rest[frame\ )  < — > 

(fun  label  =  ( \aframe .  Xav.  e[/rome;  ?;]) 

(A qframe.  rest[frame ])) 

![]  rewrite  close_f rame.enddef  :  (A qframe.  )  < - > 

![]  rewrite  close_let_subscri.pt  : 

(let  vi  =  ai[a2]  in  (Xav2.  e[vi]  t2]))  < — > 

(XaV2-  let  v\  =  ai[a2]  in  e[vi;  t2]) 

![]  rewrite  close_initialize_l  : 

(let  closure  v  =  01(02)  in  initialization  e[v]  end)  < — > 
(initialization  let  closure  v  =  01(02)  in  e[v]  end) 

![]  rewrite  close_initialize_2  : 

(let  v  =  [length  =  length]  tuple  in  initialization  e[v]  end)  < — > 
(initialization 

let  v  =  [length  =  length]  tuple  in  e[v] 

end) 

![]  rewrite  close_let_fun  : 

(let  fun  v  =  R[frame]. label  in  e[t])  < — » 

(let  closure  v  =  R. label (|  frame)  in  e[t]) 

![]  rewrite  close_tailcall  : 

(let  closure  g  =  f  (frame)  in  tailcall  j  g  args )  < — > 

(tailcall  /  ( frame  ::  args)) 
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11  M  prog  module 


This  module  defines  rewrites  to  lift  closed  function  definitions  to  the  top  level  of  the  program.  Ideally, 
these  transformations  would  be  applied  after  closure  conversion. 


11.1  Parents 

extends  M_ir 


11.2  Resources 

The  prog  resource  provides  a  generic  method  for  defining  a  method  of  lifting  closed  function  definitions 
to  the  top  level  of  a  program.  The  progC  conversion  can  be  used  to  apply  this  evaluator. 

The  implementation  of  the  prog_resource  and  the  progC  conversion  rely  on  tables  to  store  the 
shape  of  redices,  together  with  the  conversions  for  the  reduction. 


11.3  Rewrites 

The  rewrites  for  this  transformation  are  straightforward.  They  swap  a  closed  function  definition  with 
any  expression  that  comes  before  it. 

![]  rewrite  letrec_atom_fun  : 

( Xax .  let  rec  R\.  fields[R\ ]  i?2-in  e x])  < — > 

(let  rec  R\ .  fields[R\ ] 
i?2-in 

{\ax.  e[R2\  x])) 

![]  rewrite  letrec_let_atom  : 

(let  x  =  a  in  let  rec  R\.  fields[R\ ]  i?2-in  e[R-2\  x])  < — > 

(let  rec  R\.  fields[R\ } 

R2.in 

let  v  =  a  in  e[R2-,  x]) 

![]  rewrite  letrec_let_tuple  : 

(let  v  =  [length  =  length \  tuple  in 
let  rec  R,\ .  fields[R\ ] 

R2.in 

e[R2]  x])  < — > 

(let  rec  R,\ .  fields[R\] 

R2.in 

let  v  =  [length  =  length ]  tuple  in  e[R2\  x]) 

![]  rewrite  letrec_let_subscript  : 

(let  x  =  ai[a2\  in  let  rec  R\ .  fields[R\ ]  i?2-in  e[i?2;  i’])  4 — > 

(let  rec  R\.  fields[R\ ] 

R2.in 

let  x  =  01(02]  in  e[R2\  x]) 

![]  rewrite  letrec_let_closure  : 
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(let  closure  v  =  f(a)  in 
let  rec  R\ .  fields[R\ ] 

R2.in 

e[R2]  u])  < — > 

(let  rec  R\ .  fields[R\ ] 

R2.in 

let  closure  v  =  /(a)  in  e[R2;  u]) 

[]  rewrite  letrec.if .true  : 

(if  a  then  let  rec  R,\ .  fields[R\ ] 

R2.in 

e\  [R2\  else  e2)  * — > 

(let  rec  R\ .  fields[R\\ 

R2 -in 

(if  a  then  ei[R2]  else  e2)) 

[]  rewrite  letrec.if .false  : 

(if  a  then  e\  else  let  rec  R,\ .  fields[R\\ 

R2.in 

e2  [-^2])  < — > 

(let  rec  R\.  fields[R\ } 

R2.\yi 

(if  a  then  e\  else  e^R^)) 

[]  rewrite  letrec_fun_def  : 

(fun  label  =  let  rec  R\.  fields[R\ ]  R2.in  e[R2\ 
rest )  < — » 

(let  rec  R\.  fields[R\ } 

R2.in 

fun  label  =  e[R2] 
rest) 

[]  rewrite  letrec.f  ields.def  : 

({  let  rec  R,\ .  fields[R\ ]  R2.in  e[R2]  })  < — ■> 

(let  rec  R\.  fields[R\ } 

R2.\n 

{ }) 

[]  rewrite  letrec.letrec  : 

(let  rec  R\.  let  rec  R2.  fields[R2 }  R3.in  e\[R\;  E3] 

f?4.in 

e2[-R4])  < — > 

(let  rec  R2.  fields[R2\ 

R3.in 

let  rec  R\ .  e\[R\]  -R3] 

f?4.in 

62(^4]) 
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12  M  dead  module 


This  module  implements  an  aggressive  form  of  dead-code  elimination.  A  let-definition  is  considered 
dead  if  the  variable  it  defines  is  not  used.  If  the  defining  value  would  normally  raise  an  exception  (e.g., 
division  by  zero),  the  semantics  of  the  program  could  change. 


12.1  Parents 

extends  M_ir 


12.2  Resources 

The  dead  resource  provides  a  generic  method  for  defining  dead  code  elimination.  The  deadC  conversion 
can  be  used  to  apply  this  evaluator. 

The  implementation  of  the  dead_resource  and  the  deadC  conversion  rely  on  tables  to  store  the 
shape  of  redices,  together  with  the  conversions  for  the  reduction. 


12.3  Rewrites 

The  rewrites  are  straightforward.  Note  that  in  the  redeces  below,  v  is  not  allowed  to  be  free  in  e.  Each 
of  these  rewrites  is  added  to  the  dead_resource. 

![]  rewrite  dead_let_atom  :  (let  v  =  a  in  e)  < - >  e 

![]  rewrite  dead_let_tuple  : 

(let  v  =  [length  =  length ]  tuple  in  e)  < — >  e 

![]  rewrite  dead_let_subscript  :  (let  v  =  01(02]  in  e)  « - >  e 

![]  rewrite  dead_let_closure  :  (let  closure  v  =  01(02)  in  e)  * - >  e 


13  M_inline  module 

This  module  implements  a  simple  form  of  constant  folding  and  constant  inlining.  We  do  not  inline 
functions  due  to  our  somewhat  cumbersome  choice  of  representation  for  function  definitions. 


13.1  Parents 

extends  Mur 
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13.2  Meta-arithmetic 


We  use  the  MetaPRL  built-in  meta-arithmetic  to  fold  constants.  Arithmetic  is  performed  using  meta¬ 
terms,  so  we  need  a  way  to  convert  back  to  a  number  (i.e.,  atom). 

declare  M.inline  !  Metalnt{  ’  e}  (displayed  as  Meta[e ]) 

![]  rewrite  meta_int_elim  {|  reduce  |}  :  Meta[im\  < — >  ffi 


13.3  Rewrites 

Each  of  the  rewrites  below  is  added  to  the  reduce_resource.  We  group  them  into  ones  to  perform 

constant  folding  and  ones  to  inline  constants. 

13.3.1  Constant  folding 

Constant  folding  is  straightforward  given  the  meta-arithmetic  provided  by  MetaPRL. 

![]  rewrite  reduce.add  :  (ffi  +  jfj)  < - >  Meta[i  +m  j] 

![]  rewrite  reduce.sub  :  (ffi  —  ffj)  < - >  Meta[i  —m  j] 

![]  rewrite  reduce_mul  :  ( ffi  *  ffj)  < - >  Meta[i  *m  j] 

![]  rewrite  reduce.div  :  (ffi  /  ffj)  < - >  Meta[i  -rm  j] 

13.3.2  Constant  inlining 

Constant  inlining  is  also  straightforward.  We  can  inline  the  branches  of  conditional  expressions  if  we 

know  the  guards  at  compile  time. 

![]  rewrite  reduce_let_atom_true  {|  reduce  |}  : 

(let  v  =  true  in  e[v])  < — >  e[true] 

![]  rewrite  reduce_let_atom_f alse  {|  reduce  |}  : 

(let  v  =  false  in  e[v])  < — >  e[false\ 

![]  rewrite  reduce_let_atom_int  {|  reduce  |}  : 

(let  v  =  ffi  in  e[v])  < — >  e[jfi\ 

![]  rewrite  reduce_let_atom_var  {|  reduce  |}  : 

(let  V2  =  |  v\  in  e[t^])  < — >  e[v\\ 

![]  rewrite  reduce.if .true  {|  reduce  |}  : 

(if  true  then  e\  else  e2)  * — »  e± 

![]  rewrite  reduce.if .false  {|  reduce  |}  : 

(if  false  then  e\  else  e2)  * — >  e 2 


We  need  these  last  three  rewrites  to  ensure  that  the  final  program  produced  is  well-formed.  Variables 
whose  values  have  been  inlined  are  rewritten  to  their  value. 

![]  rewrite  unf  old.atom.var.true  {|  reduce  |}  :  j  true  * - >  true 

![]  rewrite  unf old_atom_var_f alse  {|  reduce  |}  :  j  false  < - »  false 

![]  rewrite  unf  old.atom.var.int  {|  reduce  |}  :  j  ffi  * - >  ffi 
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14  M  x86  asm  module 


This  module  defines  our  representation  of  x86  assembly  code.  The  one  difference  here,  compared  to 
traditional  approaches,  is  that  we  continue  to  use  variable  scoping. 


14.1  Parents 

extends  Basertheory 


14.2  x86  operands 

declare  M_x86_asm!  ImmediateNumber  [i  :n]  (displayed  as  $i) 
declare 

M_x86_asm ! ImmediateLabel [label : t] { ’R}  (displayed  as  R. label) 
declare 

M_x86_asm !  ImmediateCLabel [label : t] { ’ R} 

(displayed  as  %R. label ) 

declare  M_x86_asm!Register{’v}  (displayed  as  %v) 

declare  M_x86_asm!  SpillMemoryj  ’  label}  (displayed  as  spill  [label]) 

declare 

M_x86_asm ! SpillRegisterj 1 v;  ’ label} 

(displayed  as  spill [u,  label]) 
declare 

M_x86_asm ! ContextRegister [label : t] 

(displayed  as  context  [label]) 
declare  M_x86_asm!MemReg{,r}  (displayed  as  (%r)) 
declare  M_x86_asm!  MemRegOf  f  [i  :n]  { ’  r}  (displayed  as  i(%r )) 
declare 

M_x86_asm!MemRegReg0ffMul [off :n,  mul:n]{’rl;  ’  r2} 
(displayed  as  (%r\,%r2,  of  f,mul)) 


14.3  Condition  codes 

These  condition  codes  are  used  in  the  Jcc  (conditional  jump)  instruction  below. 

declare  M_x86_asm!  CC  ["It" :  s]  (displayed  as  It) 
declare  M  x86  asm! CC ["le" : s]  (displayed  as  le) 
declare  M_x86_asm!  CC  ["z"  :s]  (displayed  as  z) 
declare  M  x86  asm! CC ["nz" : s]  (displayed  as  nz) 
declare  M_x86_asm! CC ["gt" : s]  (displayed  as  gt) 
declare  M_x86_asm! CC ["ge" : s]  (displayed  as  ge) 
declare  M_x86_asm!  CC  ["b"  :s]  (displayed  as  b) 
declare  M_x86_asm!  CC  ["be" :  s]  (displayed  as  be) 
declare  M_x86_asm!  CC  ["a"  :s]  (displayed  as  a) 
declare  M_x86_asm!  CC  ["ae" :  s]  (displayed  as  ae) 
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14.4  Instructions 

We  want  the  assembly  to  have  “semi-functional”  property,  meaning  that  registers  are  immutable.  The 
register  allocator  will  coalesce  registers,  creating  implicit  assignments  in  the  process. 

This  presents  an  interesting  problem  for  the  x86,  since  it  uses  the  two-operand  instruction  form.  To 
get  around  this,  we  define  a  normal  two-operand  instruction  set  for  memory  operands.  Then  we  define 
a  three-operand  set  for  register  destination  operands.  Again,  the  allocator  is  responsible  for  making 
sure  the  dst  and  the  first  src  register  are  the  same. 

Further,  for  simplicity,  we  categorize  instructions  into  several  kinds: 

•  Mov  defines  a  new  register  from  an  arbitrary  operand 

•  Inst\  [opnarne]:  a  normal  one-operand  instruction 

•  Inst2[opname\:  this  is  a  normal  two-operand  instruction 

•  Inst^[opname]:  a  MUL/DIV  instruction 

•  Shift[o/mame]:  a  shift  instruction 

•  Cmp [opname\:  a  comparison;  both  operands  are  sources 

•  Set  [opnarne]:  the  set /cc  instruction 
declare 

M_x86_asm!Mov{’src;  dst.  rest ['dst]} 

(displayed  as  mov  src,  %dst  /*  LET  */  rest[dst]) 
declare 

M_x86_asm ! Spill [opcode : s]  {’ src ;  dst.  rest [’dst]} 

(displayed  as  M  _x86.asmlSpill[opcode  :  s]{src;  dst.  rest\dst ]}) 

declare 

M_x86_asm ! Inst  1 [opcode : s] { ’ dst ;  ’rest} 

(displayed  as  opcode  dst  /*  Memory  operand  */  rest ) 
declare 

M_x86_asm ! Inst  1 [opcode : s] {’ src ;  dst.  rest  [’dst]} 

(displayed  as  opcode  src,  Vo  dst  rest[dst ]) 
declare 

M_x86_asm ! Inst2 [opcode : s] {’ src ;  ’dst;  ’rest} 

(displayed  as  opcode  src,  dst  /*  Memory  operand  */  rest ) 
declare 

M_x86_asm ! Inst2 [opcode : s] {’ srcl ;  ’src2;  dst.  rest [’dst]} 

(displayed  as  opcode  src\,  src2,  dst  rest[dst ]) 
declare 

M_x86_asm ! Inst3 
[opcode : s] 

{ ’ srcl ; 

’ src2; 

’ src3; 

dst2,  dst3.  rest[’dst2;  ’dst3]} 

(displayed  as 

opcode  src\,  src2,  Vodst2,  Vodst 3 
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rest[dst2\  dst 3]) 

declare 

M_x86_asm ! Shift [opcode : s] {’ src ;  ’dst;  ’rest} 

(displayed  as  opcode  src,  dst  /*  Memory  operand  */  rest ) 
declare 

M_x86_asm ! Shift [opcode : s] {’ srcl ;  ’src2;  dst.  rest  [’dst]} 
(displayed  as  opcode  src\,  src2,  Vo  dst  rest[dst ]) 
declare 

M_x86_asm! Cmp [opcode : s] {’ srcl ;  ’src2;  ’rest} 

(displayed  as  opcode  srci,  src2  rest) 
declare 

M_x86_asm ! Set [opcode : s] {’ cc ;  ’dst;  rest [’dst]} 

(displayed  as  M_x86_asm\Set[opcode  :  s]{cc;  dst ;  rest.[dst. ]}) 

declare 

M_x86_asm ! Set [opcode : s]  {’ cc ;  ’src;  dst.  rest [’dst]} 
(displayed  as  opcodefcc]  src,  %dst  rest[dst ]) 
declare  M_x86_asm!  AsmArgNil  (displayed  as  ()) 

declare  M_x86_asm!  AsmArgConsj  ’a;  ’rest}  (displayed  as  a  ::  rest) 
declare 

M_x86_asm ! Jmp [opcode : s] { ’ label ;  ’ args} 

(displayed  as  opcode  label(args)) 
declare 

M_x86_asm ! Jcc [opcode : s] {’ cc ;  ’restl;  ’rest2} 

(displayed  as  opcodefcc]  begin  rest\  end  res^) 


This  is  a  pseudo- instruction  that  calls  the  garbage  collector  to  ensure  that  the  specified  number  of  words 
is  available.  The  parameters  are  the  live  registers  (normally  the  parameters  to  the  current  function). 

declare 

M_x86_asm ! AsmReserve [words : n] { ’ params} 

(displayed  as  reserve  words  words  args  (params)  in) 


The  Comment  instruction  is  not  a  real  instruction.  It  is  used  to  include  a  comment  in  the  program 
output;  the  text  is  given  in  the  string  parameter. 

declare 

M_x86_asm ! Comment [comment : s] { ’ rest} 

(displayed  as  /  *  Comment  :  comment  *  /  rest) 


The  program  initialization  is  wrapped  in  the  Init  term;  we  don’t  include  the  initialization  code  in  the 
program  output. 

declare  M_x86_asm!  Init{ ’rest}  (displayed  as  initialize  rest  end) 
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14.5  Programs 

A  program  is  a  set  of  recursive  definitions,  just  like  it  is  in  the  IR.  The  labels  in  the  assembly  correspond 
to  functions,  and  the  register  allocator  is  responsible  for  ensuring  that  the  calling  convention  is  respected. 

declare  M_x86_asm!  LabelAsm  [label :  t]  {  ’R}  (displayed  as  R.label  :) 
declare 

M_x86_asm ! LabelRecjRl .  fields[’Rl];  R2.  rest[’R2]} 

(displayed  as 

/  *  Label RecFields[Ri]  begins  here  *  / 
fields[Ri]/  *  Label RecFields[Ri]  ends  here  */ 

/  *  Label RecBody[R2\  begins  here  *  / 
rest[R2\) 

declare 

M_x86_asm ! LabelDefj ’ label ;  ’code;  ’rest} 

(displayed  as  label  code  rest) 
declare  M  x86  asm! LabelEnd  (displayed  as  ) 
declare 

M_x86_asm!LabelFun{v.  instsf’v]} 

(displayed  as  /  *  par  am  v  *  /  insts[v ]) 


15  M  x86  codegen  module 

This  module  implements  the  translation  of  IR  terms  to  x86  assembly. 


15.1  Parents 

extends  Mur 
extends  M_x86_frame 


15.2  Terms 

We  define  several  terms  to  represent  the  assembly  translation.  All  these  terms  are  eliminated  by  the 
end  of  the  translation  process. 


•  assemble  e  end  represents  the  translation  of  IR  expressions  into  sequences  of  assembly  instruc¬ 
tions. 

•  let  v  =  assemble(a)  in  e\v\  represents  the  translation  of  an  IR  atom  into  an  assembly  operand, 
which  in  turn  is  substituted  for  variable  v  in  e\v\. 

•  assemble  args[  src  =  argsi  dst  =  args2  as  v  ]  in  e[v]  represents  the  translation  of  IR  function 
arguments  into  assembly  operands 

•  assemble  [R]  e  end  represents  the  translation  of  the  mutually  recursive  IR  functions  in  record 
R  and  the  rest  of  the  program. 
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declare  M  x86  codegen !ASM{’e}  (displayed  as  assemble  e  end) 
declare 

M_x86_codegen ! ASM{ ’ a;  v.  e[’v]} 

(displayed  as  let  v  =  assemble  (a)  in  e[v\) 
declare 

M_x86_codegen ! ASM{ ’ argsl ;  ’args2;  v.  e[’v]} 

(displayed  as 

assemble  args[  src  =  argsi  dst  =  args2  as  v  ]  in 

eM) 

declare 

M_x86_codegen ! ASM{ ’ R;  ’e}  (displayed  as  assemble  [i?]  e  end) 


Helper  terms  to  store  fields  into  a  tuple. 

declare 

M_x86_codegen ! store_tuple{ ’v;  ’tuple;  ’rest} 
(displayed  as  store  tuple [  v,  tuple  ]  rest ) 
declare 

M_x86_codegen ! store_tuple{ ’v;  ’off;  ’tuple;  ’rest} 
(displayed  as  store_tuple[  v,  off,  tuple  ]  rest) 

![]  rewrite  unf  old_store_tuple  {|  reduce  |}  : 
store_tuple[  v,  tuple  ] 
rest  < — > 

/  *  Comment  :  unfoldstoreJuple  *  / 

mov  %v,  %p  /*  LET  */ 
store_tuple[  p,  0,  tuple  ] 
rest 

![]  rewrite  unf  old_store_tuple_cons  {|  reduce  |}  : 
store_tuple[  v,  off,  a  ::  tl  ] 
rest  < — > 

/  *  Comment  :  unf  old  .store  Jtuple_cons  *  / 

let  v\  =  assemble  (a)  in 

mov  v\,  off  <  v  >  /*  Memory  operand  */ 

store_tuple[  v,  ( off  +  %word.size ),  tl  ] 

rest 

![]  rewrite  unf old_store_tuple_nil  {|  reduce  |}  : 
store_tuple[  v ,  off,  ()  ]  rest  < — >  rest 


Terms  used  to  reverse  the  order  of  the  atoms  in  tuples. 

declare 

M_x86_codegen ! rever se_tuple{ ’ tuple} 

(displayed  as  reverse  tuple  [  tuple  ]) 
declare 

M_x86_codegen ! reverse_tuple{ ’ dst ;  ’ src} 
(displayed  as  reverse_tuple[  src  =  src  dst  =  dst  ]) 
![]  rewrite  unf old_reverse_tuple  {|  reduce  |}  : 
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reverse  tuple [  tuple  ]  < — »  reverse  tuple [  src  =  tuple  dst  =  ()  ] 
![]  rewrite  reduce_reverse_tuple_cons  {|  reduce  |}  : 
reverse  tuple  [  src  =  a  ::  rest  dst  =  dst  ]  < — > 
reverse _tuple[  src  =  rest  dst  =  a  ::  dst  ] 

![]  rewrite  reduce_reverse_tuple_nil  {|  reduce  |}  : 
reverse _tuple[  src  =  ()  dst  =  dst  ]  < — >  dst 


Reverse  the  order  of  arguments. 

declare 

M_x86_codegen ! reverse_args{ ’ args} 

(displayed  as  reverse _args[  args  ]) 
declare 

M_x86_codegen ! rever se_args{ ’ args 1 ;  ’ args2} 

(displayed  as  reverse _args[  src  =  args2  dst  =  argsi  ]) 

![]  rewrite  unf old_reverse_args  {|  reduce  |}  : 

reverse _args[  args  ]  < - >  reverse _args[  src  =  args  dst  =  ()  ] 

![]  rewrite  reduce_reverse_args_cons  {|  reduce  |}  : 

reverse  _args[  src  =  a  ::  rest  dst  =  args  ]  < - > 

reverse  args  [  src  =  rest  dst  =  a  ::  args  ] 

![]  rewrite  reduce_reverse_args_nil  {|  reduce  |}  : 
reverse_args[  src  =  ()  dst  =  args  ]  < - »  args 


Copy  the  arguments  into  registers. 

declare 

M_x86_codegen ! copy _args{ ’ args ;  v.  e[’v]} 

(displayed  as  let  v  =  copy_args[  args  ]  in  e[v]) 
declare 

M_x86_codegen ! copy _args{ ’ argsi ;  Jargs2;  v.  e[’v]} 
(displayed  as 

let  v  =  copy_args[  src  =  args2  dst  =  argsi  ]  in 
e[v}) 

![]  rewrite  unf  old_copy_args  {|  reduce  |}  : 
let  v  =  copy  args  [  args  ]  in 
e[u ]  < — > 

let  v  =  copy_args[  src  =  args  dst  =  ()  ]  in 

e[v 

![]  rewrite  reduce_copy_args_cons  {|  reduce  |}  : 
let  v  =  copy_args[  src  =  a  ::  rest  dst  =  dst  ]  in 

e[u ]  < — > 

mov  a ,  %arg  /*  LET  */ 

let  v  =  copy_args[  src  =  rest  dst  =  %arg  ::  dst  ]  in 

e[v 

![]  rewrite  reduce_copy_args_nil  {|  reduce  |}  : 
let  v  =  copy  args [  src  =  ()  dst  =  dst  ]  in 

e[v ]  < — > 
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e  [reverse  args[  dst  ]] 


Assemble  function  arguments. 

![]  rewrite  asm_arg_cons  {|  reduce  |}  : 

assemble  args[  src  =  (a  ::  rest )  dst  =  args  as  v  ]  in 
e[v ]  < — » 

let  arg  =  assemble  (a)  in 

assemble  args[  src  =  rest  dst  =  arg  ::  args  as  v  ]  in 
e[v 

![]  rewrite  asm_arg_nil  {|  reduce  |}  : 

assemble  args[  src  =  ()  dst  =  args  as  v  ]  in 
e[v\  < — > 

e[reverse_args[  args  ]] 


15.3  Code  generation 

In  our  runtime,  integers  are  shifted  to  the  left  and  use  the  upper  31  bits  only.  The  least  significant  bit 
is  set  to  1,  so  that  we  can  distinguish  between  integers  and  pointers. 

15.3.1  Atoms 

Booleans  are  translated  to  integers.  We  use  the  standard  encodings,  0  for  false  and  1  for  true,  which  in 
our  integer  representation  translate  to  1  and  3,  respectively. 

![]  rewrite  asm_atom_f  alse  {|  reduce  |}  : 

let  v  =  assemble(/a^se)  in  e[v\  < — »  e[$l] 

![]  rewrite  asm_atom_true  {|  reduce  |}  : 
let  v  =  assemble  (true)  in  e[v]  < — >  e[$3] 


Integers  are  adjusted  for  our  runtime  representation. 

![]  rewrite  asm_atom_int  {|  reduce  |}  : 

let  v  =  assemble(^i)  in  e[v\  < — >  e[$(  (i  *  2  +  1)  )] 


Variables  are  translated  to  registers. 

![]  rewrite  asm_atom_var  {|  reduce  |}  : 

let  V2  =  assemble(  |  ri)  in  e[t^]  < — >  e[%ri] 


Function  labels  become  labels. 

![]  rewrite  asm_atom_fun_var  {|  reduce  |}  : 

let  V2  =  assemble((i?.”Za6er))  in  e[v?\  < — >  e[%R.label\ 


Functions  are  assembled. 
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![]  rewrite  asm_atom_fun  {|  reduce  |}  : 

assemble  ( Xav .  e[v])  end  < — »  /  *  param  v  *  /  assemble  e[v]  end 


Binary  operators  are  translated  to  a  sequence  of  assembly  instructions  that  implement  that  operation. 
Note  that  each  operation  is  followed  by  adjusting  the  result  so  that  it  conforms  with  our  31-bit  integer 
representation. 

![]  rewrite  asm_atom_binop_add  {|  reduce  |}  : 
let  v  =  assemble  ((ai  +  02))  in 
e[v ]  < — > 

/  =1=  Comment  :  asm-atomJbinop-add  =1=  / 

let  v\  =  assemble  (ai)  in 
let  v2  =  assemble  (02)  in 

add  V2,  vi,  sumtmp 
dec  %  sumtmp,  Vo  sum 
e[Vosum\ 

![]  rewrite  asm_atom_binop_sub  {|  reduce  |}  : 
let  v  =  assemble  ((ai  —  02))  in 
e[u ]  < — > 

/  =1=  Comment  :  asm-atomCbinopsub  *  / 

let  v\  =  assemble  (ai)  in 
let  V2  =  assemble  (02)  in 
sub  V2,  Vi,  diff tmp 
inc  %difftmp,  Vo  diff 
e[%diff] 


In  multiplication  and  division  we  first  obtain  the  standard  integer  representation,  perform  the  appro¬ 
priate  operation,  and  adjust  the  result. 

![]  rewrite  asm_atom_binop_mul  {|  reduce  |}  : 
let  v  =  assemble  ((ai  *  02))  in 
e[v ]  < — > 

/  =1=  Comment  :  asm -atom  Jbinopmnul  *  / 

let  v\  =  assemble  (ai)  in 

let  V2  =  assemble  (02)  in 

sar  $1,  vi,  Vovlint 

sar  $1,  V2,  Vov2int 

imul  Vovlint ,  %v2int,  prodtmpi 

shl  $1,  %prodtmpl,  Voprodtmp2 

or  $1,  %prodtmp2,  prod 

e  [Voprod] 

![]  rewrite  asm_atom_binop_div  {|  reduce  |}  : 
let  v  =  assemble  ((ai  /  a2))  in 
e[v ]  < — > 

/  *  Comment  :  asm  .atom  Jbinop-div  *  / 

let  v\  =  assemble  (ai)  in 
let  v2  =  assemble  (02)  in 
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sar  $1,  vi,  %vltmp 

sar  $1,  v2,  %v2tmp 

mov  $0,  %v3 tmP  /*  LET  */ 

div  Vo v2 irnp ,  Vovl  Irnpi  Voquot^mp ij  Voremtrnp 

shl  $1,  %quottmpl,  %quottmp2 

or  $1,  %quottmp2,  quot1 

elVoquot-il 

Assembling  IR  relational  operators  is  a  mapping  to  the  appropriate  condition  codes.  The  operations 
themselves  become  assembly  comparisons. 

![]  rewrite  asm_eq  {|  reduce  |}  :  assemble  M_ir\EqOp  end  < - >  z 

![]  rewrite  asm_neq  {|  reduce  |}  :  assemble  M  J,r\N  eqOp  end  < - >  nz 

![]  rewrite  asm_lt  {|  reduce  |}  :  assemble  M JrlLtOp  end  < — >  1 

![]  rewrite  asm_le  {|  reduce  |}  :  assemble  M _ir\LeOp  end  < — >  le 

![]  rewrite  asm_gt  {|  reduce  |}  :  assemble  M _ir\GtOp  end  < — >  g 

![]  rewrite  asm_ge  {|  reduce  |}  :  assemble  M JrlGeOp  end  < — »  ge 

![]  rewrite  asm_atom_relop  {|  reduce  |}  : 
let  v  =  assemble  ((ai  op  02))  in 
e[u ]  < — > 

/  *  Comment  :  asmmtomjrelop  *  / 

let  v\  =  assemble  (ai)  in 
let  v2  =  assemble  (02)  in 
cmp  v\,  v2 

mov  $0,  %eqsrc  /*  LET  */ 

set  [assemble  op  end]  %eqsrc ,  %eqdst 

e[%eqdst] 


Reserve  memory. 

![]  rewrite  asm_reserve_l  {|  reduce  |}  : 

assemble  (reserve  reswords  words  args  params  in  e)  end  < — > 
mov  context  [limit] ,  Volimit  /*  LET  */ 
sub  context  [next],  Volimit ,  bytes 
cmp  $(  ( reswords  *  %wordsize )  ),  Vobytes 
j[b]  begin 

assemble  args[  src  =  params  dst  =  ()  as  v  ]  in 
reserve  reswords  words  args(ti)  in 
end 

assemble  e  end 

![]  rewrite  asm_reserve_2  {|  reduce  |}  : 

assemble  (reserve  words  words  in  e)  end  < — >  assemble  e  end 


The  translation  of  LetAtom  is  straightforward:  we  first  translate  the  atom  a  into  an  operand  v\ ,  which 
is  then  moved  into  v. 

![]  rewrite  asm_let_atom  {|  reduce  |}  : 
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assemble  (let  v  =  a  in  e[u])  end  < — > 
/  *  Comment  :  asmJet.atom  *  / 
let  v\  =  assemble  (a)  in 
mov  v\,  Vov  /*  LET  */ 
assemble  e[v\  end 


Conditionals  are  translated  into  a  comparison  followed  by  a  conditional  branch. 

![]  rewrite  asm_if_l  {|  reduce  |}  : 

assemble  (if  a  then  e\  else  e2)  end  < — > 

/  *  Comment  :  asmJfA  *  / 
let  test  =  assemble  (a)  in 
cmp  $0,  test 

j[z]  begin  assemble  e 2  end  end 
assemble  e\  end 


If  the  condition  is  a  relational  operation,  we  carry  over  the  relational  operator  to  the  conditional  jump. 

![]  rewrite  asm_if_2  {|  reduce  |}  : 

assemble  (if  ai  op  0,2  then  ei  else  e2)  end  * — > 

/  *  Comment  :  asmJfJl  *  / 

let  v\  =  assemble  (ai)  in 

let  V2  =  assemble  (02)  in 

cmp  V2,  v\ 

j  [assemble  op  end]  begin  assemble  ei  end  end 
assemble  e2  end 


Reading  from  the  memory  involves  assembling  the  pointer  to  the  appropriate  block  and  the  index  within 
that  block.  We  then  fetch  the  value  from  the  specified  memory  location  and  move  it  into  v. 

![]  rewrite  asm_let_subscript_l  {]  reduce  ]}  : 
assemble  (let  v  =  01(02]  in  e[v])  end  < — > 

/  *  Comment  :  asm  Jet. subscript  *  / 
let  v\  =  assemble  (ai)  in 
let  V2  =  assemble  (02)  in 
mov  v\,  Votuple  /*  LET  */ 
mov  V2,  Voindextmp  /*  LET  */ 
sar  $1,  Voindextmpi  Voindex 

mov  <  vi,  index,  0,$wordsize  >  ,  Vov  /*  LET  */ 
assemble  e[v\  end 

![]  rewrite  asm_let_subscript_2  {|  reduce  ]}  : 
assemble  (let  v  =  ai[#i]  in  e[v\)  end  < — > 

/  *  Comment  :  asm  Jet. subscript  *  / 
let  v\  =  assemble  (ai)  in 
mov  vi,  Votuple  /*  LET  */ 

mov  (i  *  $word.size )  <  tuple  >  ,  Vov  /*  LET  */ 
assemble  e[v\  end 
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Changing  a  memory  location  involves  assembling  the  block  pointer  and  the  index  within  the  block.  The 
value  to  be  written  is  assembled  and  moved  into  the  specified  memory  location. 

![]  rewrite  asm_set_subscript_l  {|  reduce  |}  : 
assemble  ( 01(02]  <—  03;  e)  end  < — > 

/  *  Comment  :  asmset  subscript  *  / 

let  v\  =  assemble  (ai)  in 

let  v<2  =  assemble  (02)  in 

mov  v\,  Votuple  /*  LET  */ 

mov  V2,  Voindextmp  /*  LET  */ 

sar  $1,  Voindextmpi  Voindex 

let  V3  =  assemble  (03)  in 

mov  113 ,  <  vi,  index,  0,$wordsize  >  /*  Memory  operand  */ 
assemble  e  end 

![]  rewrite  asm_set_subscript_2  {]  reduce  ]}  : 
assemble  (ai[$d]  <—  03;  e)  end  < — > 

/  *  Comment  :  asmset  subscript  *  / 
let  v\  =  assemble  (ai)  in 
mov  v\,  Votuple  /*  LET  */ 
let  V3  =  assemble  (03)  in 

mov  113 ,  (i  *  $wordsize )  <  v\  >  /*  Memory  operand  */ 
assemble  e  end 


Allocating  a  tuple  involves  obtaining  a  block  from  the  store  by  advancing  the  next  pointer  by  the  size 
of  the  tuple  (plus  its  header),  creating  the  header  for  the  new  block,  and  storing  the  tuple  elements  in 
that  block. 

![]  rewrite  asm_alloc_tuple  {|  reduce  |}  : 

assemble  (let  v  =  [length  =  i\  tuple  in  e[v])  end  < — > 

/  *  Comment  :  asmmllocJuple  *  / 
mov  context  [next] ,  Vov  /*  LET  */ 

add  $(  ((i  +  1)  *  §wordsize)  ),  context  [next]  /*  Memory  operand  */ 

mov  header [i],  (Vov)  /*  Memory  operand  */ 

add  $(  §wordsize  ),  Vov,  p 

store_tuple[  p,  reverse_tuple[  tuple  ]  j 

assemble  e[p]  end 


Allocating  a  closure  is  similar  to  2-tuple  allocation. 

![]  rewrite  asm_let_closure  {]  reduce  ]}  : 

assemble  (let  closure  v  =  01(02)  in  e[v\)  end  < — > 

/  =t=  Comment  :  asmJet.closure  *  / 
mov  context  [next] ,  Vov  /*  LET  */ 

add  $(  (3  *  %word_size)  ),  context  [next]  /*  Memory  operand  */ 
mov  header[2],  (Vov)  /*  Memory  operand  */ 
let  v\  =  assemble  (ai)  in 
let  V2  =  assemble  (02)  in 

mov  v\,  %wordsize  <  v  >  /*  Memory  operand  */ 
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mov  v2,  (2  *  %word.size)  <  v  >  /*  Memory  operand  */ 
add  $(  %word.size  ),  %v,  p 
assemble  e[p\  end 


Assembling  tail-calls  to  IR  functions  involve  assembling  the  function  arguments  and  jumping  to  the 
appropriate  function  label. 

![]  rewrite  asm_tailcall_direct  {|  reduce  |}  : 
assemble  (tailcall  Relabel”  args)  end  < — > 

/  *  Comment  :  asmJtailcall. direct  *  / 
assemble  args[  src  =  args  dst  =  ()  as  argsi  ]  in 
let  args2  =  copy_args[  argsi  ]  in 
jmp  R.label(args2 ) 

![]  rewrite  asm_tailcall_indirect  {|  reduce  |}  : 
assemble  (tailcall  a  args)  end  < — > 

/  *  Comment  :  asm.tailcall  *  / 

let  closuretmp  =  assemble(a)  in 

assemble  args[  src  =  args  dst  =  ()  as  argsi  ]  in 

mov  closuretmpi  Voclosure  /*  LET  */ 

mov  4 (Voclosure),  %env  /*  LET  */ 

let  args2  =  copy_args[  argsi  ]  in 

jmp  (% closure) (%env  ::  args2 ) 


An  IR  program  is  a  set  of  recursive  functions.  These  are  assembled  and  identified  by  function  labels. 

![]  rewrite  asm_letrec  {|  reduce  |}  : 

assemble  (let  rec  R\ .  fields[R\ ]  I?2-in  e[R2])  end  < — > 

/  *  Comment  :  asmJetrec  *  / 

/  *  Label RecFields[Ri]  begins  here  *  / 

assemble  [Ri]  fields[R\\  end/  =t=  Label RecFields[R\]  ends  here  =i=  / 

/  *  Label RecBody[R2\  begins  here  *  / 
assemble  e[R2]  end 
![]  rewrite  asm_fields  {|  reduce  |}  : 

assemble  [R]  ({  fields  })  end  < — >  assemble  [R]  fields  end 
![]  rewrite  asm_fun_def  {|  reduce  |}  : 

assemble  [R]  (fun  ” label”  =  e  rest)  end  < — > 

R.label  :  assemble  e  end 
assemble  [I?]  rest  end 

![]  rewrite  asm_end_def  {|  reduce  |}  :  assemble  [R]  end  < - > 

![]  rewrite  asm.initialize  {|  reduce  |}  : 
assemble  (initialization  e  end)  end  < — > 
initialize  assemble  e  end  end 


The  program  is  compilable  if  the  assembly  is  compilable. 

ff  rule  codegen_prog  : 

[m]  (r)  b  compilable  assemble  e  end  end 
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[m]  (r>  h  compilable  e  end 


16  M_x86_opt  module 

This  module  implements  some  easy  assembly  optimizations,  including  dead  instruction  elimination  and 
removal  of  null  reserves. 


16.1  Parents 

extends  Muc86_asm 
extends  M_util 


16.2  Resources 

The  before_ra  resource  provides  a  generic  method  for  defining  rewrites  that  may  be  applied  before 
register  allocation.  The  before_raC  conversion  can  be  used  to  apply  this  evaluator. 

The  implementation  of  the  bef  ore_ra_resource  and  the  before.raC  conversion  rely  on  tables  to 
store  the  shape  of  redices,  together  with  the  conversions  for  the  reduction. 

The  after_ra  resource  and  corresponding  conversion  after_raC  are  similar. 

16.3  Rewrites 

16.3.1  Dead  instruction  elimination 

Dead  instructions,  i.e.  those  instructions  that  define  a  variable  that  is  not  used  in  the  rest  of  the 
program,  may  be  eliminated.  The  rewrites  below  are  aggressive;  the  program’s  semantics  could  change 
if  an  instruction  that  can  raise  an  exception  is  eliminated.  These  rewrites  are  added  to  the  before_ra 
resource,  although  they  may  be  applied  after  register  allocation  as  well. 

![]  rewrite  deadunov  :  mov  src ,  Vodst  /*  LET  */  e  < - >  e 

![]  rewrite  dead_instl  :  opcode  src,  %dst  e  < — »  e 

![]  rewrite  dead_inst2  :  opcode  src\,  src2,  dst  e  < - »  e 

![]  rewrite  dead_inst3  :  opcode  src\,  src2,  %dst2,  Vodst^  e  * >  e 

![]  rewrite  dead_shift  :  opcode  src\,  src2,  Vo  dst  e  * - >  e 

![]  rewrite  dead_set  :  opcodefcc]  src,  Vodst  e  < - >  e 


16.3.2  Null  reserve  elimination 

Null  reserves  may  be  eliminated  from  the  program.  The  rewrite  below  is  added  to  the  af  ter_ra  resource 
since  it  is  valid  only  after  register  allocation. 

![]  rewrite  delete_null_reserve  : 
cmp  a\,  a,2 

opcodefcc]  begin  reserve  0  words  args (params)  in  end 

rest  < — > 
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